Prometheus构建黑盒监控配置

  • icmp监控
    1、prometheus 添加相关监控,Blackbox 使用默认配置启动即可
- job_name: "icmp_ping"

    metrics_path: /probe

    params:

      module: [icmp]  # 使用icmp模块

    file_sd_configs:

    - refresh_interval: 10s

      files:

      - "/home/prometheus/conf/ping_status*.yml"  #具体的配置文件

    relabel_configs:

    - source_labels: [__address__]

      regex: (.*)(:80)?

      target_label: __param_target

      replacement: ${1}

    - source_labels: [__param_target]

      target_label: instance

    - source_labels: [__param_target]

      regex: (.*)

      target_label: ping

      replacement: ${1}

    - source_labels: []

      regex: .*

      target_label: __address__

      replacement: 192.168.1.14:9115

2、相关ping节点配置

[root@cinder1 conf]# cat ping_status.yml 

- targets: ['220.181.38.150','14.215.177.39','180.101.49.12','14.215.177.39','180.101.49.11','14.215.177.38','14.215.177.38']

  labels:

    group: '一线城市-电信网络监控'

- targets: ['112.80.248.75','163.177.151.109','61.135.169.125','163.177.151.110','180.101.49.11','61.135.169.121','180.101.49.11']

  labels:

    group: '一线城市-联通网络监控'

- targets: ['183.232.231.172','36.152.44.95','182.61.200.6','36.152.44.96','220.181.38.149']

  labels:

    group: '一线城市-移动网络监控' 
  • http相关指标监控
    1、prometheus 配置http_get访问
  - job_name: "blackbox"

    metrics_path: /probe

    params:

      module: [http_2xx]  #使用http模块

    file_sd_configs: 

    - refresh_interval: 1m

      files: 

      - "/home/prometheus/conf/blackbox*.yml"

    relabel_configs:

    - source_labels: [__address__]

      target_label: __param_target

    - source_labels: [__param_target]

      target_label: instance

    - target_label: __address__

      replacement: 192.168.1.14:9115

2、相关配置文件,类似举例如下

[root@cinder1 conf]# cat /home/prometheus/conf/blackbox-dis.yml 

- targets:

  - https://www.zhibo8.cc

  - https://www.baidu.com

#配置相关URL
  • 接口get请求检测
    1、prometheus 配置,其实跟我们之前的配置一样,我们直接看配置文件
  - job_name: "check_get"

    metrics_path: /probe

    params:

      module: [http_2xx]  # Look for a HTTP 200 response.

    file_sd_configs:

    - refresh_interval: 1m

      files:

      - "/home/prometheus/conf/service_get.yml"

    relabel_configs:

    - source_labels: [__address__]

      target_label: __param_target

    - source_labels: [__param_target]

      target_label: instance

    - target_label: __address__

      replacement: 192.168.1.14:9115

2、相关接口配置参考

[root@cinder1 conf]# cat service_get.yml 

- targets:

  - http://10.10.1.123:10000/pmkb/atc_tcbi

  - http://10.10.1.123:10000/pmkb/get_ship_lock_count

  - http://10.10.1.123:10000/pmkb/get_terminal_count_by_city

  - http://10.10.1.123:10000/pmkb/get_terminal_monitor?industry=1

  - http://10.10.1.123:10000/pmkb/get_terminal_comparison?industry=1

  - http://10.10.1.123:10000/pmkb/get_terminal_city_count_industry?industry=1

  - http://10.10.1.123:10000/pmkb/industry_stat?industry=1

  - http://10.10.1.123:10000/pmkb/get_company_car_count?industry=1

  - http://10.10.1.123:10000/pmkb/get_terminal_month_countbyi?industry=1

  labels:

    group: 'service'

3、grafana 和前面一样自己订制的,可以从github上下载

  • 接口post请求状态检测
    1、这里首先我们要改一下post 相关接口的blackbox.yml配置,我们自己定义一个模块
[root@cinder1 blackbox]# cat blackbox.yml 

modules:

  http_2xx:

    prober: http

  http_post_2xx:   #这个模块名称可以自己定义

    prober: http

    http:

      method: POST

      headers:

        Content-Type: application/json   #添加头部

      body: '{"username":"admin","password":"123456"}'  #发送的相关数据,这里我们以登录接口为例

2、添加到prometheus

  - job_name: "check_service"

    metrics_path: /probe

    params:

      module: [http_post_2xx]  # 这里要对应配置文件里,定义的模块

    file_sd_configs: 

    - refresh_interval: 1m

      files: 

      - "/home/prometheus/conf/service_post.yml"

    relabel_configs:

    - source_labels: [__address__]

      target_label: __param_target

    - source_labels: [__param_target]

      target_label: instance

    - target_label: __address__

      replacement: 192.168.1.14:9115

3、相关配置查看

[root@cinder1 conf]# cat service_post.yml 

- targets:

  - http://10.2.4.103:5000/devops/api/v1.0/login

  labels:

    group: 'service'

4、添加grafana相关配置,这个也是自己定义的,可以从github上下载

  • tcp端口状态检测
    1、prometheus 配置
  - job_name: 'port_status'

    metrics_path: /probe

    params:

      module: [tcp_connect]  #使用tcp模块

    static_configs:

      - targets: ['10.10.1.35:8068','10.10.1.35:8069']  #对应主机接口

        labels:

          instance: 'port_status'

          group: 'tcp'

    relabel_configs:

    - source_labels: [__address__]

      target_label: __param_target 

    - target_label: __address__

      replacement: 192.168.1.14:9115 
  • 告警规则定义
    1、业务正常性

  • icmp、tcp、http、post 监测是否正常可以观察probe_success 这一指标

  • probe_success == 0 ##联通性异常

  • probe_success == 1 ##联通性正常

  • 告警也是判断这个指标是否等于0,如等于0 则触发异常报警

2、通过http模块我们可以获取证书的过期时间,可以根据过期时间添加相关告警

probe_ssl_earliest_cert_expiry :可以查询证书到期时间。

经过单位转换我们可以得到一下,按天来计算:(probe_ssl_earliest_cert_expiry - time())/86400

3、所以我们结合上面的配置可以定制如下告警规则

[root@cinder1 rules]# cat blackbox.yml 

groups:

- name: blackbox_network_stats

  rules:

  - alert: blackbox_network_stats

    expr: probe_success == 0

    for: 1m

    labels:

      severity: critical

    annotations:

      summary: "接口/主机/端口 {{ $labels.instance }}  无法联通"

      description: "请尽快检测"
ssl检测
[root@cinder1 rules]# cat ssl.yml 

groups:

- name: check_ssl_status

  rules:

  - alert: "ssl证书过期警告"

    expr: (probe_ssl_earliest_cert_expiry - time())/86400 <30

    for: 1h

    labels:

      severity: warn

    annotations:

      description: '域名{{$labels.instance}}的证书还有{{ printf "%.1f" $value }}天就过期了,请尽快更新证书'

      summary: "ssl证书过期警告"

4、重启完成之后我们可以登录web界面查看下

上一篇:2018徐州预选赛


下一篇:集成公告|可视化平台Dapplooker现已部署至Moonriver