一、Open-Falcon组件简述
【Open-Falcon绘图相关组件】
- Agent: 部署在目标机器采集机器监控项
- Transfer : 数据接收端,转发数据到后端Graph和Judge
- Graph:操作rrd文件存储监控数据
- Query:查询各个Graph数据,提供统一http查询接口
- Dashboard:查询监控历史趋势图的web端
- Task:负责一些定时任务,索引全量更新、垃圾索引清理、自身组件监控等
【Open-Falcon报警相关组件】
- Sender :报警发送模块,控制并发度,提供发送的缓冲queue
- UIC(FE):用户组管理,单点登录
- Portal:配置报警策略,管理机器分组的web端
- HBS:HeartBeat Server,心跳服务器
- Judge:报警判断模块
- Links:报警合并依赖的web端,存放报警详情
- Alarm:报警时间处理器
【Open-Falcon架构图】
官网架构图:
网友:
二、安装准备
1.安装Redis
http://www.cnblogs.com/xialiaoliao0911/p/7523952.html
2.安装MySQL
http://www.cnblogs.com/xialiaoliao0911/p/7523931.html
3.Open-Falocn下载地址
二进制版本:https://pan.baidu.com/s/1jOb6z-HRJ7i6nSFxf7I5Bg
4. 初始化MySQL表结构
# open-falcon所有组件都无需root账号启动,推荐使用普通账号安装,提升安全性。此处我们使用普通账号:work来安装部署所有组件
# 当然了,使用yum安装依赖的一些lib库的时候还是要有root权限的。 git clone https://github.com/open-falcon/scripts.git
cd ./scripts/
mysql -h localhost -u root --password="" < db_schema/graph-db-schema.sql
mysql -h localhost -u root --password="" < db_schema/dashboard-db-schema.sql mysql -h localhost -u root --password="" < db_schema/portal-db-schema.sql
mysql -h localhost -u root --password="" < db_schema/links-db-schema.sql
mysql -h localhost -u root --password="" < db_schema/uic-db-schema.sql
5.解压open-falcon.tar.gz
#新建用户falcon
useadd falcon
#新建临时目录tmp
su - falcon
cd /home/falcon
mkdir tmp
#解压
tar -zxf of-release-v0.1.0.tar.gz -C ./tmp/
for x in `find ./tmp/ -name "*.tar.gz"`;do \
app=`echo $x|cut -d '-' -f2`; \
mkdir -p $app; \
tar -zxf $x -C $app; \
done
三、安装Open-Falcon绘图相关组件
1.Agent
每台机器上,都需要部署agent,agent会自动采集预先定义的各种采集项,每隔60秒,push到transfer。
cd $WORKSPACE/agent/
mv cfg.example.json cfg.json vim cfg.json
- 修改 transfer这个配置项的enabled为 true,表示开启向transfer发送数据的功能
- 修改 transfer这个配置项的addr为:["127.0.0.1:8433"] (改地址为transfer组件的监听地址, 为列表形式,可配置多个transfer实例的地址,用逗号分隔) # 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可
# cfg.json中的各配置项,可以参考 https://github.com/open-falcon/agent/blob/master/README.md # 启动
./control start # 查看日志
./control tail #启动完成后,通过浏览器进行访问
http://192.168.102.141:1988/
【配置文件】
/home/falcon/tmp/agent/cfg.json
[falcon@open-falcon-demo agent]$ more cfg.json
{
"debug": false,
"hostname": "open-falcon-demo",
"ip": "192.168.102.141",
"plugin": {
"enabled": false,
"dir": "./plugin",
"git": "https://github.com/open-falcon/plugin.git",
"logs": "./logs"
},
"heartbeat": {
"enabled": true,
"addr": "127.0.0.1:6030",
"interval": ,
"timeout":
},
"transfer": {
"enabled": true,
"addrs": [
"127.0.0.1:8433",
"127.0.0.1:8433"
],
"interval": ,
"timeout":
},
"http": {
"enabled": true,
"listen": ":1988",
"backdoor": false
},
"collector": {
"ifacePrefix": ["eth", "em"]
},
"ignore": {
"cpu.busy": true,
"df.bytes.free": true,
"df.bytes.total": true,
"df.bytes.used": true,
"df.bytes.used.percent": true,
"df.inodes.total": true,
"df.inodes.free": true,
"df.inodes.used": true,
"df.inodes.used.percent": true,
"mem.memtotal": true,
"mem.memused": true,
"mem.memused.percent": true,
"mem.memfree": true,
"mem.swaptotal": true,
"mem.swapused": true,
"mem.swapfree": true
}
}
通过浏览器打开后的界面:
2.aggregator
cd $WORKSPACE/aggregator/
mv cfg.example.json cfg.json
【配置文件】
/home/falcon/tmp/aggregator/cfg.json
[falcon@open-falcon-demo aggregator]$ more cfg.json
{
"debug": false,
"http": {
"enabled": true,
"listen": "0.0.0.0:6055"
},
"database": {
"addr": "root:mysql@tcp(127.0.0.1:3306)/falcon_portal?loc=Local&parseTime=true",
"idle": ,
"ids": [, -],
"interval":
},
"api": {
"hostnames": "http://127.0.0.1:5050/api/group/%s/hosts.json",
"push": "http://127.0.0.1:6060/api/push",
"graphLast": "http://127.0.0.1:9966/graph/last"
}
}
3.Transfer
transfer默认监听在:8433端口上,agent会通过jsonrpc的方式来push数据上来。
cd $WORKSPACE/transfer/
mv cfg.example.json cfg.json # 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可
# cfg.json中的各配置项,可以参考 https://github.com/open-falcon/transfer/blob/master/README.md
# 如有必要,请酌情修改cfg.json # 启动transfer
./control start # 校验服务,这里假定服务开启了6060的http监听端口。检验结果为ok表明服务正常启动。
curl -s "http://127.0.0.1:6060/health" #查看日志
./control tail # 停止transfer
./control stop
[falcon@open-falcon-demo transfer]$ more cfg.json
{
"debug": false,
"minStep": ,
"http": {
"enabled": true,
"listen": "0.0.0.0:6060"
},
"rpc": {
"enabled": true,
"listen": "0.0.0.0:8433"
},
"socket": {
"enabled": false,
"listen": "0.0.0.0:4444",
"timeout":
},
"judge": {
"enabled": true,
"batch": ,
"connTimeout": ,
"callTimeout": ,
"maxConns": ,
"maxIdle": ,
"replicas": ,
"cluster": {
"judge-00" : "127.0.0.1:6080"
}
},
"graph": {
"enabled": true,
"batch": ,
"connTimeout": ,
"callTimeout": ,
"maxConns": ,
"maxIdle": ,
"replicas": ,
"cluster": {
"graph-00" : "127.0.0.1:6070"
}
},
"tsdb": {
"enabled": false,
"batch": ,
"connTimeout": ,
"callTimeout": ,
"maxConns": ,
"maxIdle": ,
"retry": ,
"address": "127.0.0.1:8088"
}
}
4.Graph
graph组件是存储绘图数据、历史数据的组件。transfer会把接收到的数据,转发给graph。
cd $WORKSPACE/graph/
mv cfg.example.json cfg.json
mkdir -p /home/falcon/data/6070 #新建graph数据存储目录 # 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可
# cfg.json中的各配置项,可以参考 https://github.com/open-falcon/graph/blob/master/README.md # 启动
./control start # 查看日志
./control tail # 校验服务,这里假定服务开启了6071的http监听端口。检验结果为ok表明服务正常启动。
curl -s "http://127.0.0.1:6071/health"
[falcon@open-falcon-demo graph]$ more cfg.json
{
"pid": "/home/falcon/open-falcon/graph/var/app.pid", #修改为本机实际的目录
"log": "info",
"debug": false,
"http": {
"enabled": true,
"listen": "0.0.0.0:6071"
},
"rpc": {
"enabled": true,
"listen": "0.0.0.0:6070"
},
"rrd": {
"storage": "/home/falcon/data/6070" #graph数据存储目录,需要手动建立
},
"db": {
"dsn": "root:mysql@tcp(127.0.0.1:3306)/graph?loc=Local&parseTime=true", #标记红色的为MySQL数据的root密码
"maxIdle":
},
"callTimeout": ,
"migrate": {
"enabled": false,
"concurrency": ,
"replicas": ,
"cluster": {
"graph-00" : "127.0.0.1:6070"
}
}
}
5.Query
query组件,绘图数据的查询接口,query组件收到用户的查询请求后,会从后端的多个graph,查询相应的数据,聚合后,再返回给用户。
cd $WORKSPACE/query/
mv cfg.example.json cfg.json
#进入query目录新建graph_backends.txt文件,并写入graph相关的内容,内容来源于graph的cfg.json的migrate>cluster
cd /home/falcon/tmp/query
vi graph_backends.txt
graph-00 127.0.0.1:6070 # 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可
# cfg.json中的各配置项,可以参考 https://github.com/open-falcon/query/blob/master/README.md # 启动
./control start # 查看日志
./control tail
[falcon@open-falcon-demo query]$ more cfg.json
{
"log_level": "info",
"slowlog": ,
"debug": "false",
"http": {
"enabled": true,
"listen": "0.0.0.0:9966"
},
"graph": {
"backends": "./graph_backends.txt",
"reload_interval": ,
"connTimeout": ,
"callTimeout": ,
"maxConns": ,
"maxIdle": ,
"replicas": ,
"cluster": {
"graph-00": "127.0.0.1:6070"
}
},
"api": {
"query": "http://127.0.0.1:9966",
"dashboard": "http://127.0.0.1:8081",
"max":
}
}
6.Dashboard
dashboard是面向用户的查询界面,在这里,用户可以看到push到graph中的所有数据,并查看其趋势图。
Install dependency
#配置EPEL源,安装virtualenv环境
rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum install -y python-pip
pip install virtualenv #根据MySQL实际路径,新建两个软连接
ln -s /usr/local/mysql/lib/libmysqlclient.so.20 /usr/lib/libmysqlclient.so.20
ln -s /usr/local/mysql/lib/libmysqlclient.so.20 /usr/lib64/libmysqlclient.so.20
#将pip_requirements.txt中的mysql-python这一行去掉,使用easy_install单独安装
#进入到virtualenv环境
[falcon@open-falcon-demo dashboard]$ virtualenv env
[falcon@open-falcon-demo dashboard]$ source env/bin/activate
#安装mysql-python
(env)[falcon@open-falcon-demo dashboard]$ easy_install mysql-python
#查看READ.me文件,找到./env/bin/pip install -r pip_requirements.txt -i http://pypi.douban.com/simple这行然后执行
(env)[falcon@open-falcon-demo dashboard]$ ./env/bin/pip install -r pip_requirements.txt -i http://pypi.douban.com/simple
#启动Dashboard
(env)[falcon@open-falcon-demo dashboard]$ ./control start
#查看Dashboard启动状态
(env)[falcon@open-falcon-demo dashboard]$ ./control status
#查看日志
(env)[falcon@open-falcon-demo dashboard]$ ./control tail
#退出virtualenv环境
(env)[falcon@open-falcon-demo dashboard]$ deactivate
#启动完成后,可通过浏览器进行访问
http://192.168.102.141:8081/
【配置文件】
/home/falcon/tmp/dashboard/rrd/config.py
[falcon@open-falcon-demo rrd]$ more config.py
#-*-coding:utf8-*-
import os #-- dashboard db config --
DASHBOARD_DB_HOST = "127.0.0.1"
DASHBOARD_DB_PORT =
DASHBOARD_DB_USER = "root"
DASHBOARD_DB_PASSWD = "mysql"
DASHBOARD_DB_NAME = "dashboard" #-- graph db config --
GRAPH_DB_HOST = "127.0.0.1"
GRAPH_DB_PORT =
GRAPH_DB_USER = "root"
GRAPH_DB_PASSWD = "mysql"
GRAPH_DB_NAME = "graph" #-- app config --
DEBUG = True
SECRET_KEY = "secret-key"
SESSION_COOKIE_NAME = "open-falcon"
PERMANENT_SESSION_LIFETIME = * *
SITE_COOKIE = "open-falcon-ck" #-- query config --
QUERY_ADDR = "http://127.0.0.1:9966" #BASE_DIR = "/home/falcon/open-falcon/dashboard/"
BASE_DIR="/home/falcon/data/6070" #和graph新建的数据存储目录相同
LOG_PATH = os.path.join(BASE_DIR,"log/") try:
from rrd.local_config import *
except:
pass
7.task
cd /home/falcon/tmp/task
mv cfg.example.json cfg.json
#修改配置文件
[falcon@open-falcon-demo task]$ more cfg.json
{
"debug": false,
"http": {
"enable": true,
"listen": "0.0.0.0:8002"
},
"index": {
"enable": true,
"dsn": "root:mysql@tcp(127.0.0.1:3306)/graph?loc=Local&parseTime=true", #MySQL的root密码
"maxIdle": ,
"autoDelete": false,
"cluster":{
"test.hostname01:6071" : "0 0 0 ? * 0-5",
"test.hostname02:6071" : "0 30 0 ? * 0-5"
}
},
"collector" : {
"enable": true,
"destUrl" : "http://127.0.0.1:1988/v1/push",
"srcUrlFmt" : "http://%s/statistics/all",
"cluster" : [
"transfer,test.hostname:6060",
"graph,test.hostname:6071",
"task,test.hostname:8001"
]
}
}
#启动task
[falcon@open-falcon-demo task]$ ./control start
#查看启动状态
[falcon@open-falcon-demo task]$ ./control status
#查看日志
[falcon@open-falcon-demo task]$ ./control tail
#重启
[falcon@open-falcon-demo task]$ ./control restart
四、安装Open-Falcon报警相关组件
1.Sender
调用各个公司提供的mail-provider和sms-provider,按照某个并发度,从redis中读取邮件、短信并发送,alarm生成的报警短信和报警邮件都是直接写入redis即可,sender来发送。
cd $WORKSPACE/sender/
mv cfg.example.json cfg.json
# vi cfg.json
# redis地址需要和后面的alarm、judge使用同一个
# queue维持默认
# worker是最多同时有多少个线程玩命得调用短信、邮件发送接口
# api要给出sms-provider和mail-provider的接口地址
./control start
[falcon@open-falcon-demo sender]$ more cfg.json
{
"debug": false,
"http": {
"enabled": true,
"listen": "0.0.0.0:6066"
},
"redis": {
"addr": "127.0.0.1:6379",
"maxIdle":
},
"queue": {
"sms": "/sms",
"mail": "/mail"
},
"worker": {
"sms": ,
"mail":
},
"api": {
"sms": "http://11.11.11.11:8000/sms",
"mail": "http://11.11.11.11:9000/mail"
}
}
2.UIC(FE)
cd $WORKSPACE/fe/
mv cfg.example.json cfg.json
# 请基于cfg.example.json 酌情修改相关配置项 # 启动
./control start # 查看日志
./control tail # 停止服务
./control stop
[falcon@open-falcon-demo fe]$ more cfg.json
{
"log": "debug",
"company": "MI",
"http": {
"enabled": true,
"listen": "0.0.0.0:1234"
},
"cache": {
"enabled": true,
"redis": "127.0.0.1:6379",
"idle": ,
"max": ,
"timeout": {
"conn": ,
"read": ,
"write":
}
},
"salt": "",
"canRegister": true,
"ldap": {
"enabled": false,
"addr": "ldap.example.com:389",
"baseDN": "dc=example,dc=com",
"bindDN": "cn=mananger,dc=example,dc=com",
"bindPasswd": "",
"userField": "uid",
"attributes": ["sn","mail","telephoneNumber"]
},
"uic": {
"addr": "root:mysql@tcp(127.0.0.1:3306)/uic?charset=utf8&loc=Asia%2FChongqing", #红色为MySQL数据库root密码
"idle": ,
"max":
},
"shortcut": {
"falconPortal": "http://192.168.102.141:5050/", #Portal访问地址
"falconDashboard": "http://192.168.102.141:8081/", #Dashboard访问地址
"falconAlarm": "http://192.168.102.141:9912/" #Alarm访问地址
}
}
3.Portal
portal是用于配置报警策略的地方。
yum install -y python-virtualenv # run as root cd $WORKSPACE/portal/
virtualenv ./env ./env/bin/pip install -r pip_requirements.txt # vi frame/config.py
# . 修改DB配置
# . SECRET_KEY设置为一个随机字符串
# . UIC_ADDRESS有两个,internal配置为FE模块的内网地址,portal通常是和UIC在一个网段的,
# 内网地址相互访问速度快。external是终端用户通过浏览器访问的UIC地址,很重要!
# . 其他配置可以使用默认的 ./control start portal默认监听在5050端口,浏览器访问即可
more /home/falcon/tmp/portal/frame/config.py
# -*- coding:utf- -*-
__author__ = 'Ulric Qin' # -- app config --
DEBUG = True # -- db config --
DB_HOST = "127.0.0.1"
DB_PORT =
DB_USER = "root"
DB_PASS = "mysql" #数据库密码
DB_NAME = "falcon_portal" # -- cookie config --
SECRET_KEY = "4e.5tyg8-u9ioj"
SESSION_COOKIE_NAME = "falcon-portal"
PERMANENT_SESSION_LIFETIME = * * UIC_ADDRESS = {
'internal': 'http://127.0.0.1:1234',
'external': 'http://192.168.102.141:1234', #可通过浏览器访问的地址
} UIC_TOKEN = '' MAINTAINERS = ['root']
CONTACT = 'ulric.qin@gmail.com' COMMUNITY = True try:
from frame.local_config import *
except Exception, e:
print "[warning] %s" % e
4.HBS
心跳服务器,只依赖Portal的DB cd $WORKSPACE/hbs/
mv cfg.example.json cfg.json
# vi cfg.json 把数据库配置配置为portal的db
./control start
如果先安装的绘图组件又来安装报警组件,那应该已经安装过agent了,hbs启动之后会监听一个http端口,一个rpc端口,agent要和hbs通信,重新去修改agent的配置cfg.json,把heartbeat那项enabled设置为true,并配置上hbs的rpc地址,./control restart重启agent,之后agent就可以和hbs心跳了
[falcon@open-falcon-demo hbs]$ more cfg.json
{
"debug": true,
"database": "root:mysql@tcp(127.0.0.1:3306)/falcon_portal?loc=Local&parseTime=true",
"hosts": "",
"maxIdle": ,
"listen": ":6030",
"trustable": [""],
"http": {
"enabled": true,
"listen": "0.0.0.0:6031"
}
}
5.Judge
报警判断模块,judge依赖于HBS,所以得先搭建HBS
cd $WORKSPACE/judge/
mv cfg.example.json cfg.json
# vi cfg.json
# remain: 这个配置指定了judge内存中针对某个数据存多少个点,比如host01这个机器的cpu.idle的值在内存中最多存多少个,
# 配置报警的时候比如all(#),这个#后面的数字不能超过remain-
# hbs: 配置为hbs的地址,interval默认是60s,表示每隔60s从hbs拉取一次策略
# alarm: 报警event写入alarm中配置的redis,minInterval表示连续两个报警之间至少相隔的秒数,维持默认即可
./control start
[falcon@open-falcon-demo judge]$ more cfg.json
{
"debug": true,
"debugHost": "nil",
"remain": ,
"http": {
"enabled": true,
"listen": "0.0.0.0:6081"
},
"rpc": {
"enabled": true,
"listen": "0.0.0.0:6080"
},
"hbs": {
"servers": ["127.0.0.1:6030"],
"timeout": ,
"interval":
},
"alarm": {
"enabled": true,
"minInterval": ,
"queuePattern": "event:p%v",
"redis": {
"dsn": "127.0.0.1:6379",
"maxIdle": ,
"connTimeout": ,
"readTimeout": ,
"writeTimeout":
}
}
}
6.Links
links组件的作用:当多个告警被合并为一条告警信息时,短信中会附带一个告警详情的http链接地址,供用户查看详情。
# yum install -y python-virtualenv
$ cd $WORKSPACE/links/
$ virtualenv ./env
$ ./env/bin/pip install -r pip_requirements.txt
./control start
./control status
./control tail
cd /home/falcon/tmp/links/frame
[falcon@open-falcon-demo frame]$ more config.py
# -*- coding:utf- -*-
__author__ = 'Ulric Qin' # -- app config --
DEBUG = True # -- db config --
DB_HOST = "127.0.0.1"
DB_PORT =
DB_USER = "root"
DB_PASS = "mysql"
DB_NAME = "falcon_links" # -- cookie config --
SECRET_KEY = "4e.5tyg8-u9ioj"
SESSION_COOKIE_NAME = "falcon-links"
PERMANENT_SESSION_LIFETIME = * * try:
from frame.local_config import *
except Exception, e:
print "[warning] %s" % e
7.Alarm
alarm模块是处理报警event的,judge产生的报警event写入redis,alarm从redis读取,这个模块被业务搞得很糟乱,各个公司可以根据自己公司的需求重写.
cd $WORKSPACE/alarm/
mv cfg.example.json cfg.json
# vi cfg.json
# 把redis配置成与judge同一个 ./control start
注意,alarm当前的版本,highQueues和lowQueues都不能为空,是个bug,稍候修复。我们可以把event:p0~event:p5配置到highQueues,把event:p6配置到lowQueues
[falcon@open-falcon-demo alarm]$ more cfg.json
{
"debug": true,
"uicToken": "",
"http": {
"enabled": true,
"listen": "0.0.0.0:9912"
},
"queue": {
"sms": "/sms",
"mail": "/mail"
},
"redis": {
"addr": "127.0.0.1:6379",
"maxIdle": ,
"highQueues": [
"event:p0",
"event:p1",
"event:p2",
"event:p3",
"event:p4",
"event:p5"
],
"lowQueues": [
"event:p6"
],
"userSmsQueue": "/queue/user/sms",
"userMailQueue": "/queue/user/mail"
},
"api": {
"portal": "http://192.168.102.141:5050",
"uic": "http://127.0.0.1:1234",
"links": "http://192.168.102.141:5090"
}
}
PS:本例安装open-falcon时是使用falcon用户安装的。
falcon用户的家目录是:/home/falcon
所有配置好的配置文件的打包在这里:https://pan.baidu.com/s/1ii6r0-iJYYt4Mn_WzHcfcw
【agent】
http://192.168.102.141:1988/
【dashboard】
http://192.168.102.141:8081/
【uic/fe】
http://192.168.102.141:1234/
【Portal】
http://192.168.102.141:5050/
【alarm】
http://192.168.102.141:9912/
手动触发graph
curl -s "http://127.0.0.1:6071/index/updateAll"