前段时间接到公司IT同事需求,帮助其配置smokeping的告警功能,之前配置的姿势有些问题,告警有些问题,现在调试OK,在此将关键配置点简单记录下。
关键的配置项主要有:
- 定义告警规则并配置将告警信息通过管道交给自定义的alert脚本
- 在主机定义里调用定义的告警规则
- 自定义的alert脚本对告警内容进行解析和处理
定义告警规则并配置将告警信息通过管道交给自定义的alert脚本
需要在config文件的Alert配置section中进行配置
# /usr/local/smokeping/etc/config
*** Alerts ***
# 将告警信息交给自己定的alert脚本进行处理
to = |/usr/local/smokeping/bin/send_alert.sh
from = a@b.com # 定义各种告警规则
+hostdown
type = loss
# in percent
pattern = ==0%,==0%,==0%, ==U
comment = 对端无响应 +bigloss
type = loss
# in percent
pattern = ==0%,==0%,==0%,==0%,>20%,>20%,>20%
comment = 连续3次采样-丢包率超过20% +lossdetect
type = loss
# in percent
pattern = ==0%,==0%,==0%,==0%,>0%,>0%,>0%
comment = 连续3次采样-存在丢包 +someloss
type = loss
# in percent
pattern = >0%,*12*,>0%,*12*,>0%
comment = 间断性丢包 +rttdetect
type = rtt
# in milli seconds
pattern = <100,<100,<100,<100,<100,<150,>150,>150,>150
comment = 连续3次采样延迟增大-超过150ms
The Alert section lets you setup loss and RTT pattern detectors. After each round of polling, SmokePing will examine its data and determine which detectors match. Detectors are enabled per target and get inherited by the targets children.
Detectors are not just simple thresholds which go off at first sight of a problem. They are configurable to detect special loss or RTT patterns. They let you look at a number of past readings to make a more educated decision on what kind of alert should be sent, or if an alert should be sent at all.
The patterns are numbers prefixed with an operator indicating the type of comparison required for a match.
告警规则参考:官方文档配置详解的Alert段
http://oss.oetiker.ch/smokeping/doc/smokeping_config.en.html
在主机定义里调用告警规则
配置语法
alerts = 告警规则1,告警规则2,告警规则3
如你所了解的,smokeping的配置文件里面通过"+"号的个数来定义层级关系,因此你可以在不同的层级里面调用告警规则,上级的定义可以被下级继承和覆盖(内层的优先级更高)
+ xxoo
menu = xxoo-top
title = xxoo-所有网络监控列表
host = /xxoo/net-A /xxoo/net-B /xxoo/net-C
alerts = hostdown,bigloss,lossdetect,someloss,rttdetect # 这里的作用范围就是/xxoo ++net-A
menu = Menu-Name-A
title = Titile-Name-A
host = 10.10.10.101
alerts = hostdown,bigloss,lossdetect # 这里的规则作用范围就是/xxoo/net-A ++net-B
menu = Google-DNS
title = To-Google-DNS
host = 8.8.8.8
自定义的alert脚本对告警内容进行解析和处理
smokeping在告警的时候会发送5~6个参数到告警接收媒介(这里也就是我们自定义的alert脚本),参数按照顺序分别为:name-of-alert, target, loss-pattern, rtt-pattern, hostname,[raise]。
因此我们的alert脚本需要做的就是对上述参数进行解析和处理。
告警脚本样例:
[root@smokeping ~]# cat /usr/local/smokeping/bin/send_alert.sh
#!/bin/bash
#########################################################
# Script to email a ping report on alert from Smokeping #
#########################################################
# 解析变量
alertname=$1
target=$2
losspattern=$3
rtt=$4
hostname=$5
# 自定义变量
email="xxx@yyy.com"
phone="12345678901"
smokename="AlertName"
smokeping_mail_content=/tmp/smokeping_mail_content
#smokeping_sms_content=/tmp/smokeping_sms_content # 把所有传过来的变量输出到脚本调用日志里,方便统计和问题排查
echo "$(date +%F-%T)" >> invoke.log
echo $@ >> invoke.log # 网络恢复逻辑判断
if [ "$losspattern" = "loss: 0%" ];
then
subject="Clear-${smokename}-Alert: $target host: ${hostname}"
else
subject="${smokename}Alert: ${target} – ${hostname}"
fi # generate mail content
# 清空并重新生成邮件内容
>${smokeping_mail_content}
echo "Name of Alert: " $alertname | tee -a ${smokeping_mail_content}
echo "Target: " $target | tee -a ${smokeping_mail_content}
echo "Loss Pattern: " $losspattern | tee -a ${smokeping_mail_content}
echo "RTT Pattern: " $rtt | tee -a ${smokeping_mail_content}
echo "Hostname: " $hostname | tee -a ${smokeping_mail_content}
echo "" | tee -a ${smokeping_mail_content}
echo "Ping Report:" | tee -a ${smokeping_mail_content}
ping ${hostname} -c 4 -i 0.5 | tee -a ${smokeping_mail_content} # send mail
# 发送email,下面的if逻辑其实没有什么卵用,因为脚本只要被调用了,这个${smokeping_mail_content}就一定是有内容的
if [ -s ${smokeping_mail_content} ];then
content=`cat ${smokeping_mail_content}`
curl http://notice.api.ourcompany.com/send_mail -d "receiver=${email}&subject=${subject}&content=${content}"
fi # send sms
# 判断alertname是否是hostdown,bigloss,rttdetect这几种比较严重的级别,如果是的话就调用短信接口进行短信发送。
# 注意,这里需要控制下短信发送内容的字数,要花钱的~哈哈
judge_alert_type=`echo ${alertname} | egrep "hostdown|bigloss|rttdetect"|wc -l`
if [ "${judge_alert_type}" -eq 1 ];then
curl http://notice.api.ourcompany.com/send_sms -d "receiver=${phone}&subject=${subject}&content=${alertname} on ${hostname}"
fi
[root@smokeping ~]#
上述脚本中调用了公司的通知接口进行告警的发送,此配置结合自己的需求进行调整即可
http://notice.api.ourcompany.com/send_mail
http://notice.api.ourcompany.com/send_sms
告警效果
邮件
短信