- icmp监控
1、prometheus 添加相关监控,Blackbox 使用默认配置启动即可
- job_name: "icmp_ping"
metrics_path: /probe
params:
module: [icmp] # 使用icmp模块
file_sd_configs:
- refresh_interval: 10s
files:
- "/home/prometheus/conf/ping_status*.yml" #具体的配置文件
relabel_configs:
- source_labels: [__address__]
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
- source_labels: [__param_target]
target_label: instance
- source_labels: [__param_target]
regex: (.*)
target_label: ping
replacement: ${1}
- source_labels: []
regex: .*
target_label: __address__
replacement: 192.168.1.14:9115
2、相关ping节点配置
[root@cinder1 conf]# cat ping_status.yml
- targets: ['220.181.38.150','14.215.177.39','180.101.49.12','14.215.177.39','180.101.49.11','14.215.177.38','14.215.177.38']
labels:
group: '一线城市-电信网络监控'
- targets: ['112.80.248.75','163.177.151.109','61.135.169.125','163.177.151.110','180.101.49.11','61.135.169.121','180.101.49.11']
labels:
group: '一线城市-联通网络监控'
- targets: ['183.232.231.172','36.152.44.95','182.61.200.6','36.152.44.96','220.181.38.149']
labels:
group: '一线城市-移动网络监控'
- http相关指标监控
1、prometheus 配置http_get访问
- job_name: "blackbox"
metrics_path: /probe
params:
module: [http_2xx] #使用http模块
file_sd_configs:
- refresh_interval: 1m
files:
- "/home/prometheus/conf/blackbox*.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.14:9115
2、相关配置文件,类似举例如下
[root@cinder1 conf]# cat /home/prometheus/conf/blackbox-dis.yml
- targets:
- https://www.zhibo8.cc
- https://www.baidu.com
#配置相关URL
- 接口get请求检测
1、prometheus 配置,其实跟我们之前的配置一样,我们直接看配置文件
- job_name: "check_get"
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
file_sd_configs:
- refresh_interval: 1m
files:
- "/home/prometheus/conf/service_get.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.14:9115
2、相关接口配置参考
[root@cinder1 conf]# cat service_get.yml
- targets:
- http://10.10.1.123:10000/pmkb/atc_tcbi
- http://10.10.1.123:10000/pmkb/get_ship_lock_count
- http://10.10.1.123:10000/pmkb/get_terminal_count_by_city
- http://10.10.1.123:10000/pmkb/get_terminal_monitor?industry=1
- http://10.10.1.123:10000/pmkb/get_terminal_comparison?industry=1
- http://10.10.1.123:10000/pmkb/get_terminal_city_count_industry?industry=1
- http://10.10.1.123:10000/pmkb/industry_stat?industry=1
- http://10.10.1.123:10000/pmkb/get_company_car_count?industry=1
- http://10.10.1.123:10000/pmkb/get_terminal_month_countbyi?industry=1
labels:
group: 'service'
3、grafana 和前面一样自己订制的,可以从github上下载
- 接口post请求状态检测
1、这里首先我们要改一下post 相关接口的blackbox.yml配置,我们自己定义一个模块
[root@cinder1 blackbox]# cat blackbox.yml
modules:
http_2xx:
prober: http
http_post_2xx: #这个模块名称可以自己定义
prober: http
http:
method: POST
headers:
Content-Type: application/json #添加头部
body: '{"username":"admin","password":"123456"}' #发送的相关数据,这里我们以登录接口为例
2、添加到prometheus
- job_name: "check_service"
metrics_path: /probe
params:
module: [http_post_2xx] # 这里要对应配置文件里,定义的模块
file_sd_configs:
- refresh_interval: 1m
files:
- "/home/prometheus/conf/service_post.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.14:9115
3、相关配置查看
[root@cinder1 conf]# cat service_post.yml
- targets:
- http://10.2.4.103:5000/devops/api/v1.0/login
labels:
group: 'service'
4、添加grafana相关配置,这个也是自己定义的,可以从github上下载
- tcp端口状态检测
1、prometheus 配置
- job_name: 'port_status'
metrics_path: /probe
params:
module: [tcp_connect] #使用tcp模块
static_configs:
- targets: ['10.10.1.35:8068','10.10.1.35:8069'] #对应主机接口
labels:
instance: 'port_status'
group: 'tcp'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: 192.168.1.14:9115
-
告警规则定义
1、业务正常性 -
icmp、tcp、http、post 监测是否正常可以观察probe_success 这一指标
-
probe_success == 0 ##联通性异常
-
probe_success == 1 ##联通性正常
-
告警也是判断这个指标是否等于0,如等于0 则触发异常报警
2、通过http模块我们可以获取证书的过期时间,可以根据过期时间添加相关告警
probe_ssl_earliest_cert_expiry :可以查询证书到期时间。
经过单位转换我们可以得到一下,按天来计算:(probe_ssl_earliest_cert_expiry - time())/86400
3、所以我们结合上面的配置可以定制如下告警规则
[root@cinder1 rules]# cat blackbox.yml
groups:
- name: blackbox_network_stats
rules:
- alert: blackbox_network_stats
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: "接口/主机/端口 {{ $labels.instance }} 无法联通"
description: "请尽快检测"
ssl检测
[root@cinder1 rules]# cat ssl.yml
groups:
- name: check_ssl_status
rules:
- alert: "ssl证书过期警告"
expr: (probe_ssl_earliest_cert_expiry - time())/86400 <30
for: 1h
labels:
severity: warn
annotations:
description: '域名{{$labels.instance}}的证书还有{{ printf "%.1f" $value }}天就过期了,请尽快更新证书'
summary: "ssl证书过期警告"
4、重启完成之后我们可以登录web界面查看下