背景
阿里云的云监控服务用于监控阿里云资源和互联网应用,包括阈值告警和事件告警两种模式,支持配置多种告警通知渠道。您可以将日志服务开放告警配置为其中一个通知渠道,从而由日志服务告警系统完成告警降噪、静默等处理,并且接入包括短信、电话、微信、钉钉、邮箱在内的10多种通知渠道。
云监控接入SLS
要将云监控的告警消息接入SLS,主要分为两个步骤:在SLS中创建开放告警应用;将SLS开放告警作为Webhook配置到云监控联系人。创建开放告警应用的具体步骤,可以参考文章SLS开放告警简介。下面介绍下如何将云监控的告警消息接入到SLS中。
获取回调地址
在创建开放告警应用之后,通过点击接口按钮,打开如下图所示的回调地址查看窗口。
回调地址由两部分构成:域名部分和子路径部分。其中域名部分属于SLS的接入地址,和地域相关,每个地域都有各自不同的接入地址;子路径部分包括用于发送消息的Access Key Id和开放告警应用。如下所示为一个完整的SLS回调地址:
cn-heyuan-intranet.log.aliyuncs.com/event/webhook/RAMAK_{ACCESS_KEY_ID}/a123_asdad
其中"cn-heyuan-intranet.log.aliyuncs.com"为域名部分,属于SLS通用的接入地址(endpoint);event/webhook/RAMAK_{ACCESS_KEY_ID}/a123_asdad 则为子路径部分。需要注意的是,用户需要将子路径部分中的{ACCESS_KEY_ID}替换为具体阿里云RAM账户的Access Key Id,并且将权限策略AliyunLogOpenEventWrite赋予该账户;a123_asdad则为该开放告警应用的id,用于唯一区别不同的开放告警应用。
云监控接入配置
将云监控的告警消息接入SLS开放告警有两种方式:在联系人中配置webhook回调地址,或者在规则中配置回调地址。
配置云监控联系人
在云监控联系人管理界面,点击新建联系人或者已有联系人,修改Webhook(http|https)或钉钉机器人,填入SLS开放告警回调地址,然后单击确认。
配置云监控联系组
在云监控联系人管理界面,点击新建联系组或者已有联系组,将上面配置的告警联系人添加到联系组中。
配置云监控规则
在云监控规则管理界面,点击创建报警规则或者已有报警规则,将上面的联系人组添加到通知对象中。也可以不添加联系人组,配置报警规则下的报警回调配置,填入之前获取的回调地址。
映射规则
云监控告警分为阈值告警和事件告警两种,两种消息类型的格式并不相同。
阈值告警映射规则
云监控发送的阈值告警消息为form格式,转为json后,有如下所示的消息示例:
{ "alertName": "连接数", "alertState": "ALERT", "curValue": "4.5", "dimensions": "{instanceId=i-bp1d7111111115htda, state=TCP_TOTAL, userId=11596111111355}", "expression": "$Average>=1", "instanceName": "launch-advisor-20210607/11.11.111.111", "lastTime": "27天19小时47分钟", "metricName": "Host.tcpconnection", "metricProject": "acs_ecs", "namespace": "acs_ecs", "preTriggerLevel": "WARN", "productGroupName": "null", "rawMetricName": "net_tcpconnection", "regionId": "cn-hangzhou", "regionName": "华东1(杭州)", "ruleId": "i-bp11111111115111_111111-0703-4811-9113-1c1111111111", "signature": "F111111w1111qN1111bw=", "timestamp": "1625455812126", "triggerLevel": "WARN", "userId": "11596111111355" }
会转为如下所示的SLS告警消息:
{ "aliuid": "aliuid1", "alert_instance_id": "", "alert_id": "i-bp11111111115111_111111-0703-4811-9113-1c1111111111", "alert_type": "sls_pub", "alert_name": "连接数", "region": "cn-hangzhou", "project": "sls-alert--", "project_id": 0, "next_eval_interval": 0, "alert_time": 1625455812, "fire_time": 1625455812, "fire_results": null, "fire_results_count": 0, "resolve_time": 0, "status": "firing", "results": null, "labels": { "instanceId": "i-bp1d7111111115htda", "namespace": "acs_ecs", "regionId": "cn-hangzhou", "state": "TCP_TOTAL", "userId": "11596111111355" }, "annotations": { "__cloud_monitor_type__": "threshold", "__config_app__": "sls_pub_alert", "__pub_alert_app__": "appid1", "__pub_alert_protocol__": "cloud_monitor", "__pub_alert_region__": "e", "__pub_alert_service__": "serverid1", "curValue": "4.5", "desc": "Host.tcpconnection $Average>=1 持续: 27天19小时47分钟, 详情: {instanceId=i-bp1d7111111115htda, state=TCP_TOTAL, userId=11596111111355}", "expression": "$Average\u003e=1", "instanceName": "launch-advisor-20210607/11.11.1111.1111", "lastTime": "27天19小时47分钟", "metricName": "Host.tcpconnection", "metricProject": "acs_ecs", "namespace": "acs_ecs", "preTriggerLevel": "WARN", "rawMetricName": "net_tcpconnection", "title": "acs_ecs Host.tcpconnection 当前值: 4.5" }, "severity": 6, "policy": { "alert_policy_id": "", "action_policy_id": "", "use_default": false, "repeat_interval": "0s" }, "template": null, "drill_down_query": "https://cloudmonitor.console.aliyun.com/index.htm#/alarmInfo/name=i-bp11111111115111_111111-0703-4811-9113-1c1111111111\u0026searchValue=\u0026searchType=name\u0026searchProduct=/history//" }
具体的转换规则请参考官方文档。
事件告警映射规则
云监控发送的事件消息为json格式,如下所示:
{ "traceId": "411112-c49d-4143-a38e-c111159e-0", "resourceId": "acs:ecs:cn-hangzhou:115111111111355:instance/i-bp1d71111111x15htda", "product": "ECS", "ver": "1.0", "instanceName": "launch-advisor-20210607", "level": "INFO", "userId": "115111111111355", "content": { "resourceId": "i-bp1d7411111111g111htda", "publicIpAddress": "127.0.0.1", "instanceName": "launch-advisor-20210607", "state": "Running", "privateIpAddress": "127.0.0.1", "resourceType": "ALIYUN::ECS::Instance" }, "regionId": "cn-hangzhou", "eventTime": "20210705T113013.398+0800", "name": "Instance:StateChange", "id": "26111205-51113-4D118-8119-3111113CB735", "timeMetrics": { "ingestion_in_time": 1625455813563, "ingestion_out_time": 1625455816000, "notify_in_time": 1625455819578, "engine_in_time": 1625455816467, "event_time": 1625455813398, "engine_out_time": 1625455818000 }, "status": "Normal" }
会转为如下所示的SLS告警消息:
{ "aliuid": "aliuid1", "alert_instance_id": "26111205-51113-4D118-8119-3111113CB735", "alert_id": "Instance:StateChange", "alert_type": "sls_pub", "alert_name": "Instance:StateChange", "region": "cn-hangzhou", "project": "sls-alert--", "project_id": 0, "next_eval_interval": 0, "alert_time": 1625455813, "fire_time": 1625743445, "fire_results": null, "fire_results_count": 0, "resolve_time": 0, "status": "firing", "results": null, "labels": { "resourceId": "acs:ecs:cn-hangzhou:115111111111355:instance/i-bp1d71111111x15htda" }, "annotations": { "__cloud_monitor_type__": "event", "__config_app__": "sls_pub_alert", "__pub_alert_app__": "appid1", "__pub_alert_protocol__": "cloud_monitor", "__pub_alert_region__": "e", "__pub_alert_service__": "serverid1", "content_instanceName": "launch-advisor-20210607", "content_privateIpAddress": "127.0.0.1", "content_publicIpAddress": "127.0.0.1", "content_resourceId": "i-bp1d7411111111g111htda", "content_resourceType": "ALIYUN::ECS::Instance", "content_state": "Running", "desc": "事件Instance:StateChange触发, 详情: {\"instanceName\":\"launch-advisor-20210607\",\"privateIpAddress\":\"127.0.0.1\",\"publicIpAddress\":\"127.0.0.1\",\"resourceId\":\"i-bp1d7411111111g111htda\",\"resourceType\":\"ALIYUN::ECS::Instance\",\"state\":\"Running\"}", "instanceName": "launch-advisor-20210607", "level": "INFO", "product": "ECS", "status": "Normal", "title": "Instance:StateChange: Normal", "traceId": "411112-c49d-4143-a38e-c111159e-0", "userId": "115111111111355" }, "severity": 4, "policy": { "alert_policy_id": "", "action_policy_id": "", "use_default": false, "repeat_interval": "0s" }, "template": null, "drill_down_query": "https://cloudmonitor.console.aliyun.com/index.htm#/eventmonitoring/events/detail?product=ECS\u0026eventName=Instance:StateChange" }
具体的转换规则请参考官方文档。
总结
通过将云监控告警消息接入到SLS,可以充分利用SLS提供的强大的告警功能,从而更为高效的了解以及处理服务出现的问题。