一、环境部署,见
prometheus 邮件告警 第一节
https://blog.csdn.net/oToyix/article/details/120160633
二、process-export进程监控
1、process-export 下载、配置、启动
wget -c https://github.com/ncabatoff/process-exporter/releases/download/v0.7.5/process-exporter-0.7.5.linux-amd64.tar.gz
tar -xf process-exporter-0.7.5.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/
ln -s process-exporter-0.7.5.linux-amd64 process-exporter
nohup ./process-exporter &
firewall-cmd --add-port=9256/tcp --permanent
firewall-cmd --reload
cd process-exporter
进程配置文件
vim process-exporter.yaml
process_names:
- name: "{{.Matches}}"
cmdline:
- 'mysqld'
- name: "{{.Matches}}"
cmdline:
- 'nginx'
- name: "{{.Matches}}"
cmdline:
- 'php-fpm.conf'
启动
nohup /usr/local/process-exporter/process-exporter -config.path=/usr/local/process-exporter/process-exporter.yaml &
3、prometheus服务端配置
添加告警规则 之 文件发现
vim prometheus.yml
- job_name: "proess"
file_sd_configs:
- files:
- targets/proess-*.yaml
refresh_interval: 2m
cat targets/proess-all.yaml
- targets:
- 192.168.0.63:9256
labels:
app: node-process
job: process
告警规则,当进程数为0时 告警
cat alert_rules/process_down.yaml
groups:
- name: Allprocess
rules:
- alert: InproessDown
expr: namedprocess_namegroup_num_procs == 0
for: 1m
annotations:
title: "process down"
description: 'process has been down for more than 1 m .'
labels:
severity: 'critical'
----------------end