说明
我们当前k8s集群上使用prometheus做监控,由于开发同学有部分业务使用websocket
接口,也为了能有效对业务应用进行监控和报警,很有必要对websocket api
接口存活性进行探测和监管。具体方案、实施流程和测试详见下文。
部署简单 websocket service
我们定义一个简单的websocket service
用来监控报警测试,如下:
# 创建虚拟环境,也可以直接在宿主机上部署
mkvirtualenv -p /usr/bin/python3 websocket-server
# 安装必要包
pip3 install websockets
# cat websocket-server.py
import asyncio
import websockets
async def echo(websocket, path):
async for message in websocket:
message = "I got your message: {}".format(message)
await websocket.send(message)
# 定义的ip地址要能与k8s通信
asyncio.get_event_loop().run_until_complete(websockets.serve(echo, '192.168.128.6', 8765))
asyncio.get_event_loop().run_forever()
# 启动websocket服务
python websocket-server.py &
# 查看服务
netstat -lnp|grep 8765
websocket-exports
这里我们定义一个deployment
用来将监控的多个websocket api
metrics对接到prometheus
,内容如下:
k8s websocket deployment
# cat websocket-kube-mon-prometheus.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: wss
app.kubernetes.io/version: v1.8.0
name: websocket-exporter
namespace: kube-mon
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: wss
template:
metadata:
labels:
app.kubernetes.io/name: wss
app.kubernetes.io/version: v1.8.0
spec:
containers:
- image: registry.cn-shanghai.aliyuncs.com/ai-voice-test/wss-expoter:v0.0.1
env:
- name: ENDPOINT
#多个ws用逗号分开
value: ws://www.abc.com,ws://192.168.128.6:8765
name: websocket-exporter
ports:
- containerPort: 9189
name: wss-metrics
k8s websocket service
定义websocket service
用来被prometheus
监控,内容如下:
# cat service-websocket.yaml
apiVersion: v1
kind: Service
metadata:
name: websocket
namespace: kube-mon
spec:
# 暂使用nodeport的形式
type: NodePort
ports:
- port: 9189
targetPort: 9189
protocol: TCP
nodePort: 32071
selector:
app.kubernetes.io/name: wss
获取ip port
# 启动上面deploy和service
kubectl apply -f websocket-kube-mon-prometheus.yaml
kubectl apply -f service-websocket.yaml
# 查看pod和service
kubectl get pod -n kube-mon
kubectl get svc -n kube-mon
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
websocket NodePort 172.20.237.56 <none> 9189:32071/TCP 1h
配置prometheus
配置prometheus监控
# vim sidecar/cm-kube-mon-sidecar.yaml //添加以下配置
- job_name: 'websocket'
static_configs:
- targets: ['172.20.237.56:9189']
# 重载
kubectl apply -f sidecar/cm-kube-mon-sidecar.yaml
# prometheus reload:
curl -X POST http://prometheus-pod-ip:9090/-/reload
配置告警规则
# vim sidecar/rules-cm-kube-mon-sidecar.yaml //添加以下配置
- alert: websocket 接口探测到异常
expr: websocket{job="websocket"} < 1
for: 30s
labels:
severity: 紧急
annotations:
#summary: "接口{{ $labels.url }} 探测异常"
description: "websocket地址: {{ $labels.url }} 探测异常 , 状态为: down ."
# 重载,prometheus有热更新,稍等待1分钟左右即可
kubectl apply -f sidecar/rules-cm-kube-mon-sidecar.yaml
报警测试
关闭测试websocket service
# 查看进程号
netstat -lnp|grep 8765
# 杀掉进程
kill you-id
我们可以终端请求直接看到接口监控状态,如下:
curl 172.20.237.56:9189/metrics
# HELP websocket websocket_help
# TYPE websocket gauge
websocket{url="ws://10.32.128.6:8765"} 0
稍等待一会儿,报警信息报出,内容如下:
重新运行websocket service
curl 172.20.237.56:9189/metrics
# HELP websocket websocket_help
# TYPE websocket gauge
websocket{url="ws://10.32.128.6:8765"} 1
python websocket-server.py &
稍等待一会儿,恢复信息报出,内容如下: