日志服务SLS开放告警接入云监控

背景

阿里云的云监控服务用于监控阿里云资源和互联网应用,包括阈值告警和事件告警两种模式,支持配置多种告警通知渠道。您可以将日志服务开放告警配置为其中一个通知渠道,从而由日志服务告警系统完成告警降噪、静默等处理,并且接入包括短信、电话、微信、钉钉、邮箱在内的10多种通知渠道。

云监控接入SLS

要将云监控的告警消息接入SLS,主要分为两个步骤:在SLS中创建开放告警应用;将SLS开放告警作为Webhook配置到云监控联系人。创建开放告警应用的具体步骤,可以参考文章SLS开放告警简介。下面介绍下如何将云监控的告警消息接入到SLS中。

获取回调地址

在创建开放告警应用之后,通过点击接口按钮,打开如下图所示的回调地址查看窗口。

日志服务SLS开放告警接入云监控

回调地址由两部分构成:域名部分和子路径部分。其中域名部分属于SLS的接入地址,和地域相关,每个地域都有各自不同的接入地址;子路径部分包括用于发送消息的Access Key Id和开放告警应用。如下所示为一个完整的SLS回调地址:

cn-heyuan-intranet.log.aliyuncs.com/event/webhook/RAMAK_{ACCESS_KEY_ID}/a123_asdad


其中"cn-heyuan-intranet.log.aliyuncs.com"为域名部分,属于SLS通用的接入地址(endpoint);event/webhook/RAMAK_{ACCESS_KEY_ID}/a123_asdad 则为子路径部分。需要注意的是,用户需要将子路径部分中的{ACCESS_KEY_ID}替换为具体阿里云RAM账户的Access Key Id,并且将权限策略AliyunLogOpenEventWrite赋予该账户;a123_asdad则为该开放告警应用的id,用于唯一区别不同的开放告警应用。

云监控接入配置

将云监控的告警消息接入SLS开放告警有两种方式:在联系人中配置webhook回调地址,或者在规则中配置回调地址。

配置云监控联系人

在云监控联系人管理界面,点击新建联系人或者已有联系人,修改Webhook(http|https)或钉钉机器人,填入SLS开放告警回调地址,然后单击确认

日志服务SLS开放告警接入云监控

配置云监控联系组

在云监控联系人管理界面,点击新建联系组或者已有联系组,将上面配置的告警联系人添加到联系组中。

日志服务SLS开放告警接入云监控

配置云监控规则

在云监控规则管理界面,点击创建报警规则或者已有报警规则,将上面的联系人组添加到通知对象中。也可以不添加联系人组,配置报警规则下的报警回调配置,填入之前获取的回调地址。

日志服务SLS开放告警接入云监控

映射规则

云监控告警分为阈值告警和事件告警两种,两种消息类型的格式并不相同。

阈值告警映射规则

云监控发送的阈值告警消息为form格式,转为json后,有如下所示的消息示例:

{
    "alertName": "连接数",
    "alertState": "ALERT",
    "curValue": "4.5",
    "dimensions": "{instanceId=i-bp1d7111111115htda, state=TCP_TOTAL, userId=11596111111355}",
    "expression": "$Average>=1",
    "instanceName": "launch-advisor-20210607/11.11.111.111",
    "lastTime": "27天19小时47分钟",
    "metricName": "Host.tcpconnection",
    "metricProject": "acs_ecs",
    "namespace": "acs_ecs",
    "preTriggerLevel": "WARN",
    "productGroupName": "null",
    "rawMetricName": "net_tcpconnection",
    "regionId": "cn-hangzhou",
    "regionName": "华东1(杭州)",
    "ruleId": "i-bp11111111115111_111111-0703-4811-9113-1c1111111111",
    "signature": "F111111w1111qN1111bw=",
    "timestamp": "1625455812126",
    "triggerLevel": "WARN",
    "userId": "11596111111355"
}

会转为如下所示的SLS告警消息:

{
    "aliuid": "aliuid1",
    "alert_instance_id": "",
    "alert_id": "i-bp11111111115111_111111-0703-4811-9113-1c1111111111",
    "alert_type": "sls_pub",
    "alert_name": "连接数",
    "region": "cn-hangzhou",
    "project": "sls-alert--",
    "project_id": 0,
    "next_eval_interval": 0,
    "alert_time": 1625455812,
    "fire_time": 1625455812,
    "fire_results": null,
    "fire_results_count": 0,
    "resolve_time": 0,
    "status": "firing",
    "results": null,
    "labels": {
        "instanceId": "i-bp1d7111111115htda",
        "namespace": "acs_ecs",
        "regionId": "cn-hangzhou",
        "state": "TCP_TOTAL",
        "userId": "11596111111355"
    },
    "annotations": {
        "__cloud_monitor_type__": "threshold",
        "__config_app__": "sls_pub_alert",
        "__pub_alert_app__": "appid1",
        "__pub_alert_protocol__": "cloud_monitor",
        "__pub_alert_region__": "e",
        "__pub_alert_service__": "serverid1",
        "curValue": "4.5",
        "desc": "Host.tcpconnection $Average>=1 持续: 27天19小时47分钟, 详情: {instanceId=i-bp1d7111111115htda, state=TCP_TOTAL, userId=11596111111355}",
        "expression": "$Average\u003e=1",
        "instanceName": "launch-advisor-20210607/11.11.1111.1111",
        "lastTime": "27天19小时47分钟",
        "metricName": "Host.tcpconnection",
        "metricProject": "acs_ecs",
        "namespace": "acs_ecs",
        "preTriggerLevel": "WARN",
        "rawMetricName": "net_tcpconnection",
        "title": "acs_ecs Host.tcpconnection 当前值: 4.5"
    },
    "severity": 6,
    "policy": {
        "alert_policy_id": "",
        "action_policy_id": "",
        "use_default": false,
        "repeat_interval": "0s"
    },
    "template": null,
    "drill_down_query": "https://cloudmonitor.console.aliyun.com/index.htm#/alarmInfo/name=i-bp11111111115111_111111-0703-4811-9113-1c1111111111\u0026searchValue=\u0026searchType=name\u0026searchProduct=/history//"
}

具体的转换规则请参考官方文档

事件告警映射规则

云监控发送的事件消息为json格式,如下所示:

{
    "traceId": "411112-c49d-4143-a38e-c111159e-0",
    "resourceId": "acs:ecs:cn-hangzhou:115111111111355:instance/i-bp1d71111111x15htda",
    "product": "ECS",
    "ver": "1.0",
    "instanceName": "launch-advisor-20210607",
    "level": "INFO",
    "userId": "115111111111355",
    "content": {
        "resourceId": "i-bp1d7411111111g111htda",
        "publicIpAddress": "127.0.0.1",
        "instanceName": "launch-advisor-20210607",
        "state": "Running",
        "privateIpAddress": "127.0.0.1",
        "resourceType": "ALIYUN::ECS::Instance"
    },
    "regionId": "cn-hangzhou",
    "eventTime": "20210705T113013.398+0800",
    "name": "Instance:StateChange",
    "id": "26111205-51113-4D118-8119-3111113CB735",
    "timeMetrics": {
        "ingestion_in_time": 1625455813563,
        "ingestion_out_time": 1625455816000,
        "notify_in_time": 1625455819578,
        "engine_in_time": 1625455816467,
        "event_time": 1625455813398,
        "engine_out_time": 1625455818000
    },
    "status": "Normal"
}

会转为如下所示的SLS告警消息:

{
    "aliuid": "aliuid1",
    "alert_instance_id": "26111205-51113-4D118-8119-3111113CB735",
    "alert_id": "Instance:StateChange",
    "alert_type": "sls_pub",
    "alert_name": "Instance:StateChange",
    "region": "cn-hangzhou",
    "project": "sls-alert--",
    "project_id": 0,
    "next_eval_interval": 0,
    "alert_time": 1625455813,
    "fire_time": 1625743445,
    "fire_results": null,
    "fire_results_count": 0,
    "resolve_time": 0,
    "status": "firing",
    "results": null,
    "labels": {
        "resourceId": "acs:ecs:cn-hangzhou:115111111111355:instance/i-bp1d71111111x15htda"
    },
    "annotations": {
        "__cloud_monitor_type__": "event",
        "__config_app__": "sls_pub_alert",
        "__pub_alert_app__": "appid1",
        "__pub_alert_protocol__": "cloud_monitor",
        "__pub_alert_region__": "e",
        "__pub_alert_service__": "serverid1",
        "content_instanceName": "launch-advisor-20210607",
        "content_privateIpAddress": "127.0.0.1",
        "content_publicIpAddress": "127.0.0.1",
        "content_resourceId": "i-bp1d7411111111g111htda",
        "content_resourceType": "ALIYUN::ECS::Instance",
        "content_state": "Running",
        "desc": "事件Instance:StateChange触发, 详情: {\"instanceName\":\"launch-advisor-20210607\",\"privateIpAddress\":\"127.0.0.1\",\"publicIpAddress\":\"127.0.0.1\",\"resourceId\":\"i-bp1d7411111111g111htda\",\"resourceType\":\"ALIYUN::ECS::Instance\",\"state\":\"Running\"}",
        "instanceName": "launch-advisor-20210607",
        "level": "INFO",
        "product": "ECS",
        "status": "Normal",
        "title": "Instance:StateChange: Normal",
        "traceId": "411112-c49d-4143-a38e-c111159e-0",
        "userId": "115111111111355"
    },
    "severity": 4,
    "policy": {
        "alert_policy_id": "",
        "action_policy_id": "",
        "use_default": false,
        "repeat_interval": "0s"
    },
    "template": null,
    "drill_down_query": "https://cloudmonitor.console.aliyun.com/index.htm#/eventmonitoring/events/detail?product=ECS\u0026eventName=Instance:StateChange"
}

具体的转换规则请参考官方文档

总结

通过将云监控告警消息接入到SLS,可以充分利用SLS提供的强大的告警功能,从而更为高效的了解以及处理服务出现的问题。

上一篇:日志服务Dashboard加速


下一篇:缺少dll文件的解决方法