pinpoint通过api批量设置告警

pinpoint默认是从web界面设置应用告警规则的。pinpoint官方文档中也并未有相关api接口的说明,但操作pinpoint web界面时,按F12打开开发者工具,可观察到其api接口。

假设当前pinpoint web的地址为http://172.31.2.5:8079/,上面有个应用为xmgate:

  • 获取应用列表:

    GET请求,http://172.31.2.5:8079/applications.pinpoint

    其返回json数据,类似于:

    [{"applicationName":"xmgate","serviceType":"SPRING_BOOT","code":1210}]
  • 获取指定应用的告警规则:

    GET请求,http://172.31.2.5:8079/application/alarmRule.pinpoint?applicationId=xmgate

    其返回json数据,类似于:

    [{"ruleId":"1","applicationId":"xmgate","serviceType":"SPRING_BOOT","checkerName":"HEAP USAGE RATE","threshold":80,"userGroupId":"DevOpsEngineers","smsSend":false,"emailSend":true,"notes":""}]
  • 设置告警规则:

    POST请求,http://172.31.2.5:8079/application/alarmRule.pinpoint,需携带请求头“Content-Type: application/json”,根据所需设置的告警规则需携带相应载荷,类似于(不同的监控指标可能会有所差异):

    {"applicationId":"xmgate","serviceType":"SPRING_BOOT","checkerName":"HEAP USAGE RATE","userGroupId":"DevOpsEngineers","threshold":80,"emailSend":true,"smsSend":false,"notes":""}

    其返回json数据,类似于:

    {'result': 'SUCCESS', 'ruleId': '35'}


根据上述分析,可编写如下python脚本:

[root@gw5 ~]# cat setAlarm.py

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

 

import sys, json, urllib.request, re

 

# pinpoint web地址

ppWeb = 'http://172.31.2.5:8079'

# pinpoint web中接收告警的用户组

userGroup = 'DevOpsEngineers'

# 需设置告警的性能指标名称(mtc)及阈值(tsd)的列表,可按需增加

metricList = [{'mtc':'SLOW RATE','tsd':30}, {'mtc':'ERROR RATE','tsd':30}, {'mtc':'HEAP USAGE RATE','tsd':80}, {'mtc':'JVM CPU USAGE RATE','tsd':80}, {'mtc':'DATASOURCE CONNECTION USAGE RATE','tsd':80}, {'mtc':'FILE DESCRIPTOR COUNT','tsd':10000}]

 

# 访问pinpoint的函数

def accessPP(Url, Header, Data):

    url, header, data = Url, Header, Data

    if not data:

        request = urllib.request.Request(url)

    else:

        request = urllib.request.Request(url, json.dumps(data).encode("utf-8"))

    if header:

        for key in header:

            request.add_header(key, header[key])

    try:

        response = urllib.request.urlopen(request)

    except Exception as e:

        print('[ERROR] %s' % e)

        sys.exit(1)

    else:

        return json.loads(response.read( ).decode("utf-8"))

    finally:

        if 'response' in vars( ):

            response.close( )

 

# 主函数

def main():

    # 获取应用列表

    url = '%s/applications.pinpoint' % ppWeb

    header = {}

    data = {}

    appList = accessPP(url, header, data)

    if not appList:

        print(u'[INFO] pinpoint中未发现有应用!')

        sys.exit(0)

    for app in appList:

        # 获取应用告警规则列表

        url = '%s/application/alarmRule.pinpoint?applicationId=%s' % (ppWeb, app['applicationName'])

        header = {}

        data = {}

        alarmRuleList = accessPP(url, header, data)

        # 若告警规则已存在则跳过,若不存在则进行设置

        url = '%s/application/alarmRule.pinpoint' % ppWeb

        header = {'Content-Type': 'application/json'}

        for metric in metricList:

            if re.findall(metric['mtc'], str(alarmRuleList)):

                print(u'[INFO] 应用程序 "%s" 跳过设置告警规则 "%s"' % (app['applicationName'], metric['mtc']))

                continue

            data = {

                    "applicationId": app['applicationName'],

                    "serviceType": app['serviceType'],

                    "checkerName": metric['mtc'],

                    "userGroupId": userGroup,

                    "threshold": metric['tsd'],

                    "emailSend": "true",

                    "smsSend": "false",

                    "notes": ""

                   }

            state = accessPP(url, header, data)

            # 由于pinpoint对传入的参数未做校验,所以基本上返回的都是'SUCCESS',所以下面的判断没啥太大意义,但还是留着备用吧

            if state['result'] == 'SUCCESS':

                print(u'[INFO] 应用程序 "%s" 告警规则设置成功 "%s"' % (app['applicationName'], metric['mtc']))

            else:

                print(u'[ERROR] 应用程序 "%s" 告警规则设置失败 "%s"' % (app['applicationName'], metric['mtc']))

                print(u'[INFO] 返回信息 %s' % state)

 

main( )

执行脚本:

[root@gw5 ~]# ./setAlarm.py


该脚本为如下性能指标设置了告警规则(如需增删性能指标或调整阈值,可自行修改脚本中的metricList变量):

  • SLOW RATE

  • ERROR RATE

  • HEAP USAGE RATE

  • JVM CPU USAGE RATE

  • DATASOURCE CONNECTION USAGE RATE

  • FILE DESCRIPTOR COUNT


该脚本对于已设置告警规则的性能指标会跳过设置,若是未设置的则会新增告警规则。也可将该脚本放在Linux服务器的crontab中定时运行,以实现对新增应用自动设置告警规则。

上一篇:从Java视角理解CPU缓存(CPU Cache)


下一篇:实战 | Pinpoint全链路监控搭建