k8s上使用prometheus监控websocket服务

说明

我们当前k8s集群上使用prometheus做监控,由于开发同学有部分业务使用websocket接口,也为了能有效对业务应用进行监控和报警,很有必要对websocket api接口存活性进行探测和监管。具体方案、实施流程和测试详见下文。

部署简单 websocket service

我们定义一个简单的websocket service用来监控报警测试,如下:

# 创建虚拟环境,也可以直接在宿主机上部署
mkvirtualenv -p /usr/bin/python3 websocket-server
# 安装必要包
pip3 install websockets
# cat websocket-server.py

import asyncio
import websockets

async def echo(websocket, path):
    async for message in websocket:
        message = "I got your message: {}".format(message)
        await websocket.send(message)
# 定义的ip地址要能与k8s通信
asyncio.get_event_loop().run_until_complete(websockets.serve(echo, '192.168.128.6', 8765))
asyncio.get_event_loop().run_forever()
# 启动websocket服务
python websocket-server.py &
# 查看服务
netstat -lnp|grep 8765

websocket-exports

这里我们定义一个deployment用来将监控的多个websocket apimetrics对接到prometheus,内容如下:

k8s websocket deployment

# cat websocket-kube-mon-prometheus.yaml 

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: wss
    app.kubernetes.io/version: v1.8.0
  name: websocket-exporter
  namespace: kube-mon
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: wss
  template:
    metadata:
      labels:
        app.kubernetes.io/name: wss
        app.kubernetes.io/version: v1.8.0
    spec:
      containers:
      - image: registry.cn-shanghai.aliyuncs.com/ai-voice-test/wss-expoter:v0.0.1
        env:
          - name: ENDPOINT
            #多个ws用逗号分开
            value: ws://www.abc.com,ws://192.168.128.6:8765
        name: websocket-exporter
        ports:
        - containerPort: 9189
          name: wss-metrics

k8s websocket service

定义websocket service用来被prometheus监控,内容如下:

# cat service-websocket.yaml

apiVersion: v1
kind: Service
metadata:
  name: websocket
  namespace: kube-mon
spec:
  # 暂使用nodeport的形式
  type: NodePort
  ports:
  - port: 9189
    targetPort: 9189
    protocol: TCP
    nodePort: 32071
  selector:
    app.kubernetes.io/name: wss

获取ip port

# 启动上面deploy和service
kubectl apply -f websocket-kube-mon-prometheus.yaml 
kubectl apply -f service-websocket.yaml
# 查看pod和service
kubectl get pod -n kube-mon
kubectl get svc -n kube-mon
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
websocket              NodePort    172.20.237.56    <none>        9189:32071/TCP   1h

配置prometheus

配置prometheus监控

# vim sidecar/cm-kube-mon-sidecar.yaml //添加以下配置
    - job_name: 'websocket'
      static_configs:
        - targets: ['172.20.237.56:9189']
# 重载
kubectl apply -f sidecar/cm-kube-mon-sidecar.yaml 
# prometheus reload:
curl -X POST http://prometheus-pod-ip:9090/-/reload

配置告警规则

# vim sidecar/rules-cm-kube-mon-sidecar.yaml //添加以下配置
      - alert: websocket 接口探测到异常
        expr: websocket{job="websocket"} < 1
        for: 30s
        labels:
          severity: 紧急
        annotations:
          #summary: "接口{{ $labels.url }} 探测异常"
          description: "websocket地址: {{ $labels.url }} 探测异常 , 状态为: down ."
# 重载,prometheus有热更新,稍等待1分钟左右即可
kubectl apply -f sidecar/rules-cm-kube-mon-sidecar.yaml

报警测试

关闭测试websocket service

# 查看进程号
netstat -lnp|grep 8765
# 杀掉进程
kill you-id

我们可以终端请求直接看到接口监控状态,如下:

curl 172.20.237.56:9189/metrics
# HELP websocket websocket_help
# TYPE websocket gauge
websocket{url="ws://10.32.128.6:8765"} 0

稍等待一会儿,报警信息报出,内容如下:

k8s上使用prometheus监控websocket服务

重新运行websocket service

curl 172.20.237.56:9189/metrics
# HELP websocket websocket_help
# TYPE websocket gauge
websocket{url="ws://10.32.128.6:8765"} 1
python websocket-server.py &

稍等待一会儿,恢复信息报出,内容如下:

k8s上使用prometheus监控websocket服务

参考库

上一篇:洛谷T178525题题解


下一篇:shell报警系统