目录
Docker官方的容器编排方案swarm mode,在k8s流行的今天有越来越边缘化的趋势,但是swarm mode部署和运维简单,所需的硬件资源要求也小得多,非常适合部署中小型分布式微服务架构的系统。
首先说明一下本文的验证环境:
操作系统:Redhat EL7.6
swarm集群信息:3个manager,22个worker
所用工具清单:
名称 | 版本 | 说明 |
docker | 19.03.13 | 20+同样适用 |
prometheus | 2.26.0 | 该版本官方支持swarm mode服务发现,本文还使用了node-exporter、alertmanager组件 |
cadvisor | 0.37.5 | 可直接部署至宿主机,使用systemctl服务管理 |
grafana | 7.5.5 | 图形化监控信息展示 |
karma | 0.85 | prometheus告警仪表板,展示所有告警信息 |
prometheus-webhook-dingtalk | 1.4.0 | 推送钉钉通知 |
traefik | 2.4.8 | 反向代理grafana,prometheus,karma等ui服务 |
本文所涉及的配置文件可从 https://github.com/cjie001/swarm-monitoring 获取。
1. cadvisor部署
按照目前网上的资料,最新版本的cadvisor以docker stack方式难以部署成功,会遇到 "open /dev/kmsg: no such file or directory" 错误,原因是stack方式部署到swarm集群时,docker-compose.yml的privileged参数设置失效,仅适用于单机docker-compose up模式或docker run方式启动cadvisor,但这样无法利用docker stack的优势,可以用global模式将容器部署到所有docker主机。后从github上看到一个巧妙的解决办法:
......
cadvisor:
image: docker:19.0.3
volumes:
- /var/run/docker.sock:/var/run/docker.sock
entrypoint: ["/bin/sh","-c"]
networks:
- cnet
deploy:
mode: global
environment:
- TZ=Asia/Shanghai
- PARENT={{.Task.Name}}
- CHILDNAME={{.Service.Name}}_sidecar.{{.Node.ID}}.{{.Task.ID}}
- CADVISOR_VERSION=v0.37.5
command:
- |
exec docker run -i --rm --network="container:$${PARENT}" \
--env=TZ=Asia/Shanghai \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/var/run/docker.sock:/var/run/docker.sock:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--name $${CHILDNAME} \
--privileged \
--device=/dev/kmsg \
cjie001/cadvisor:0.37.5 -docker_only
实际上是以docker stack方式启动docker官方的docker:19.0.13镜像,在其容器内再执行docker run命令来绕过 privileged 参数失效的问题,同时又能够方便地部署到各个节点上。只是采用这种方式,每台主机需要多运行一个docker容器,略占资源。
考虑到cadvisor是go语言编写,仅一个可执行文件,也可以直接将cadvisor部署到宿主机:
从https://github.com/google/cadvisor/releases/download/v0.37.5/cadvisor 下载cadvisor,放到Redhat/Centos下的/usr/bin目录即可,并编写了一个cadvisor.service文件,用systemctl来管理服务,只是一次性地部署工作,也还算方便。
# cd /usr/bin
# wget https://github.com/google/cadvisor/releases/download/v0.37.5/cadvisor
cadvisor.service示例:
[Unit]
Description=Container Monitoring Agent
After=docker.service
Requires=docker.service
[Service]
EnvironmentFile=/etc/environment
TimeoutStartSec=0
ExecStart=/usr/bin/cadvisor --port=18080
KillMode=process
OOMScoreAdjust=-500
TimeoutSec=0
RestartSec=2
Restart=always
对于Redhat/Centos系统,将cadvisor.service文件放到/usr/lib/systemd/system/目录下,使用systemctl enable cadvisor命令启用自启动模式,systemctl start cadvisor手动启动cadvisor服务。
2. prometheus配置
prometheus的配置分两部分介绍,一是prometheus的配置文件说明,二是报警规则的配置示例。
2.1 prometheus.yml配置
目前prometheus官方服务发现组件已经支持swarm,docker metrics监控示例(需要在/etc/docker/daemon.json中启用metrics-addr和experimental参数):
# Create a job for Docker daemons.
#
# This exemple requires Docker daemons to be configured to expose
# Prometheus metrics, as documented here:
# https://docs.docker.com/config/daemon/prometheus/
- job_name: 'docker'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
role: nodes
relabel_configs:
# Fetch metrics on port 9323.
- source_labels: [__meta_dockerswarm_node_address]
target_label: __address__
replacement: $1:9323
但这里会有个问题,就是当宿主机是leader的时候,relabel_configs后的IP是0.0.0.0,这就导致leader节点的metrics监控地址是错误的。研究了一下relabel_configs,对__meta_dockerswarm_node_manager_address进行正则匹配(见下方regex配置),取出管理节点的IP,最终采用如下配置可解决问题:
# Create a job for Docker daemons.
- job_name: 'docker'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock
role: nodes
relabel_configs:
# Fetch metrics on port 9323.
- source_labels: [__meta_dockerswarm_node_address]
target_label: __address__
replacement: $1:9323
- source_labels: [__meta_dockerswarm_node_manager_address]
regex: ([^:]+):\d+
target_label: __address__
replacement: $1:9323
# Set hostname as instance label
- source_labels: [__meta_dockerswarm_node_hostname]
target_label: instance
完整prometheus配置:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'swarm监控'
rule_files:
- "swarm_node.rules.yml"
- "swarm_task.rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Docker daemons监控
- job_name: 'docker'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
role: nodes
relabel_configs:
# Fetch metrics on port 9323.
- source_labels: [__meta_dockerswarm_node_address]
target_label: __address__
replacement: $1:9323
- source_labels: [__meta_dockerswarm_node_manager_address]
regex: ([^:]+):\d+
target_label: __address__
replacement: $1:9323
# Set hostname as instance label
- source_labels: [__meta_dockerswarm_node_hostname]
target_label: instance
# Docker Swarm容器监控
# cadvisor直接部署到宿主机使用以下配置
#- job_name: 'cadvisor'
# dockerswarm_sd_configs:
# - host: unix:///var/run/docker.sock
# role: nodes
# relabel_configs:
# # Fetch metrics on port 18080.
# - source_labels: [__meta_dockerswarm_node_address]
# target_label: __address__
# replacement: $1:18080
# - source_labels: [__meta_dockerswarm_node_manager_address]
# regex: ([^:]+):\d+
# target_label: __address__
# replacement: $1:18080
# # Set hostname as instance label
# - source_labels: [__meta_dockerswarm_node_hostname]
# target_label: instance
# cadvisor使用容器化部署使用以下配置
- job_name: 'cadvisor'
dns_sd_configs:
- names:
- 'tasks.cadvisor'
type: 'A'
port: 8080
# Docker swarm主机节点监控(需要global方式部署node-exporter容器)
- job_name: 'node-exporter'
dns_sd_configs:
- names:
- 'tasks.node-exporter'
type: 'A'
port: 9100
2.2 告警规则配置
本例使用了swarm node(主机)和task(service)两个维度的告警规则,具体内容如下:
swarm节点告警规则
swarm_node.rules.xml
- name: swarm_node.rules
rules:
- alert: node_cpu_usage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[1m]) * ON(instance) GROUP_LEFT(node_name)
node_meta * 100) BY (node_name)) > 50
for: 1m
labels:
severity: warning
annotations:
description: Swarm 节点 {{ $labels.node_name }} CPU 使用率 {{ humanize $value}}%
summary: CPU 告警 '{{ $labels.node_name }}'
- alert: node_memory_usage
expr: sum(((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes)
* ON(instance) GROUP_LEFT(node_name) node_meta * 100) BY (node_name) > 80
for: 1m
labels:
severity: warning
annotations:
description: Swarm 节点 {{ $labels.node_name }} 内存使用率 {{ humanize $value}}%
summary: Memory 告警 '{{ $labels.node_name }}'
- alert: node_disk_usage
expr: ((node_filesystem_size_bytes{mountpoint="/rootfs"} - node_filesystem_free_bytes{mountpoint="/rootfs"})
* 100 / node_filesystem_size_bytes{mountpoint="/rootfs"}) * ON(instance) GROUP_LEFT(node_name)
node_meta > 85
for: 1m
labels:
severity: warning
annotations:
description: Swarm 节点 {{ $labels.node_name }} 磁盘使用率 {{ humanize $value}}%
summary: Disk 告警 '{{ $labels.node_name }}'
- alert: node_disk_fill_rate_6h
expr: predict_linear(node_filesystem_free_bytes{mountpoint="/rootfs"}[1h], 6 * 3600) * ON(instance)
GROUP_LEFT(node_name) node_meta < 0
for: 1h
labels:
severity: critical
annotations:
description: Swarm 节点 {{ $labels.node_name }} 6小时后磁盘将满
summary: 磁盘空间不足告警 '{{ $labels.node_name }}'
swarm task告警规则
swarm_task.rules.yml
groups:
- name: swarm_task.rules
rules:
- alert: task_high_cpu_usage_50 # 容器CPU使用率超50%
expr: sum(rate(container_cpu_usage_seconds_total{container_label_com_docker_swarm_task_name=~".+"}[1m]))
BY (container_label_com_docker_swarm_task_name, container_label_com_docker_swarm_node_id)
* 100 > 50
for: 1m
labels:
severity: critical
annotations:
description: '{{ $labels.container_label_com_docker_swarm_task_name }} on ''{{
$labels.container_label_com_docker_swarm_node_id }}'' CPU 使用率 {{ humanize
$value}}%.'
summary: CPU 告警 '{{ $labels.container_label_com_docker_swarm_task_name
}}' on '{{ $labels.container_label_com_docker_swarm_node_id }}'
- alert: task_high_memory_usage_1g #容器内存使用率超1g
expr: sum(container_memory_rss{container_label_com_docker_swarm_task_name=~".+"})
BY (container_label_com_docker_swarm_task_name, container_label_com_docker_swarm_node_id) > 1e+09
for: 1m
labels:
severity: critical
annotations:
description: '{{ $labels.container_label_com_docker_swarm_task_name }} on ''{{
$labels.container_label_com_docker_swarm_node_id }}'' 占用内存 {{ humanize
$value}}.'
summary: 内存告警 '{{ $labels.container_label_com_docker_swarm_task_name
}}' on '{{ $labels.container_label_com_docker_swarm_node_id }}'
- alert: service_group_warnning #指定stack中服务数量监控(最小数量确定, 示例为8)
expr: count(rate(container_memory_usage_bytes{container_label_com_docker_stack_namespace=~"my_stack"}[1m])) < 8
for: 1m
labels:
severity: critical
annotations:
description: 'my_stack 容器数量不足: {{ $value }} / 8.'
summary: '服务数量不符合预期'
- alert: services_down #指定服务名监控
expr: absent(count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name=~"my_service"}))
for: 30s
labels:
severity: critical
team: dev
annotations:
description: '服务 my_service 停止超过 30 秒'
summary: '服务无可用实例'
3. prometheus及相关工具部署
本例采用docker stack方式部署prometheus、cadvisor、granafa、karma、webhook-dingtalk、traefik等工具。
3.1 docker-compose.yml配置
version: "3.8"
cadvisor:
image: docker:19.03.13
volumes:
- /var/run/docker.sock:/var/run/docker.sock
entrypoint: ["/bin/sh","-c"]
networks:
- cnet
deploy:
mode: global
environment:
- TZ=Asia/Shanghai
- PARENT={{.Task.Name}}
- CHILDNAME={{.Service.Name}}_sidecar.{{.Node.ID}}.{{.Task.ID}}
- CADVISOR_VERSION=v0.37.5
command:
- |
exec docker run -i --rm --network="container:$${PARENT}" \
--env=TZ=Asia/Shanghai \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/var/run/docker.sock:/var/run/docker.sock:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--name $${CHILDNAME} \
--privileged \
--device=/dev/kmsg \
cjie001/cadvisor:0.37.5 -docker_only
grafana:
image: cjie001/grafana:7.5.5
networks:
- cnet
environment:
- TZ=Asia/Shanghai
- GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
- GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
- GF_USERS_ALLOW_SIGN_UP=false
#- GF_SERVER_ROOT_URL=${GF_SERVER_ROOT_URL:-localhost}
#- GF_SMTP_ENABLED=${GF_SMTP_ENABLED:-false}
#- GF_SMTP_FROM_ADDRESS=${GF_SMTP_FROM_ADDRESS:-grafana@test.com}
#- GF_SMTP_FROM_NAME=${GF_SMTP_FROM_NAME:-Grafana}
#- GF_SMTP_HOST=${GF_SMTP_HOST:-smtp:25}
#- GF_SMTP_USER=${GF_SMTP_USER}
#- GF_SMTP_PASSWORD=${GF_SMTP_PASSWORD}
volumes:
- grafana:/var/lib/grafana
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 128M
reservations:
memory: 64M
labels:
- traefik.enable=true
- traefik.docker.network=cnet
- traefik.http.routers.grafana-http.rule=Host(`grafana.${DOMAIN?Variable DOMAIN not set}`)
- traefik.http.routers.grafana-http.entrypoints=http
- traefik.http.services.grafana.loadbalancer.server.port=3000
dingtalk:
image: cjie001/prometheus-webhook-dingtalk:latest
#ports:
# - "8060:8060"
environment:
TZ: 'Asia/Shanghai'
command:
- '--config.file=/etc/prometheus-webhook-dingtalk/config.yml'
- '--web.enable-ui'
- '--web.enable-lifecycle'
networks:
- cnet
volumes:
- dingtalk_etc:/etc/prometheus-webhook-dingtalk
deploy:
resources:
limits:
memory: 128M
reservations:
memory: 64M
labels:
- traefik.enable=true
- traefik.docker.network=cnet
- traefik.http.routers.webhook-dingtalk-http.rule=Host(`dingtalk.${DOMAIN?Variable DOMAIN not set}`)
- traefik.http.routers.webhook-dingtalk-http.entrypoints=http
- traefik.http.middlewares.webhook-dingtalk-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
- traefik.http.routers.webhook-dingtalk-http.middlewares=webhook-dingtalk-auth
- traefik.http.services.webhook-dingtalk.loadbalancer.server.port=8060
alertmanager:
image: cjie001/alertmanager:v0.21.0
networks:
- cnet
environment:
TZ: 'Asia/Shanghai'
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 128M
reservations:
memory: 64M
labels:
- traefik.enable=true
- traefik.docker.network=cnet
- traefik.http.routers.alertmanager-http.rule=Host(`alertmanager.${DOMAIN?Variable DOMAIN not set}`)
- traefik.http.routers.alertmanager-http.entrypoints=http
- traefik.http.routers.alertmanager-http.middlewares=alertmanager-auth
- traefik.http.middlewares.alertmanager-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
- traefik.http.services.alertmanager.loadbalancer.server.port=9093
karma:
image: lmierzwa/karma:v0.85
networks:
- cnet
#ports:
#- "8061:8080"
environment:
- TZ=Asia/Shanghai
- ALERTMANAGER_URI=http://alertmanager:9093
deploy:
mode: replicated
resources:
limits:
memory: 128M
reservations:
memory: 64M
replicas: 1
labels:
- traefik.enable=true
- traefik.docker.network=cnet
- traefik.http.routers.karma-http.rule=Host(`karma.${DOMAIN?Variable DOMAIN not set}`)
- traefik.http.routers.karma-http.entrypoints=http
- traefik.http.middlewares.karma-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
- traefik.http.routers.karma-http.middlewares=karma-auth
- traefik.http.services.karma.loadbalancer.server.port=8080
node-exporter:
image: cjie001/node-exporter:v1.1.2
networks:
- cnet
environment:
- TZ=Asia/Shanghai
- NODE_ID={{.Node.ID}}
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- /etc/hostname:/etc/nodename
command:
- '--path.sysfs=/host/sys'
- '--path.procfs=/host/proc'
- '--collector.textfile.directory=/etc/node-exporter/'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
- '--no-collector.ipvs'
deploy:
mode: global
resources:
limits:
memory: 128M
reservations:
memory: 64M
prometheus:
image: cjie001/prometheus:v2.26.0
#ports:
#- "9090:9090"
networks:
- cnet
environment:
- TZ=Asia/Shanghai
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention=${PROMETHEUS_RETENTION:-48h}'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
volumes:
- prometheus_data:/prometheus
- prometheus_etc:/etc/prometheus
- /var/run/docker.sock:/var/run/docker.sock
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 2048M
reservations:
memory: 128M
labels:
- traefik.enable=true
- traefik.docker.network=cnet
- traefik.http.routers.prometheus-http.rule=Host(`prometheus.${DOMAIN?Variable DOMAIN not set}`)
- traefik.http.routers.prometheus-http.entrypoints=http
- traefik.http.middlewares.prometheus-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
- traefik.http.routers.prometheus-http.middlewares=prometheus-auth
- traefik.http.services.prometheus.loadbalancer.server.port=9090
traefik:
image: traefik:latest
ports:
- 80:80
- 8080:8080
#- 443:443
networks:
- cnet
command:
- "--global.checknewversion=false"
- "--entrypoints.http.address=:80"
- "--entrypoints.testapi.address=:3308"
- "--api=true"
- "--api.insecure=true"
- "--api.dashboard=true"
- "--api.debug=false"
- "--ping=true"
- "--log.level=info"
- "--log.format=common"
- "--accesslog=true"
- "--providers.docker=true"
- "--providers.docker.watch=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.docker.endpoint=unix:///var/run/docker.sock"
- "--providers.docker.swarmMode=true"
- "--providers.docker.network=cnet"
volumes:
- /etc/localtime:/etc/localtime:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
labels:
- "traefik.enable=true"
- "traefik.docker.network=cnet"
healthcheck:
test: ["CMD-SHELL", "wget -q --spider --proxy off localhost:8080/ping || exit 1"]
interval: 10s
retries: 3
volumes:
prometheus_data:
prometheus_etc:
dingtalk_etc:
grafana:
networks:
cnet:
3.2 docker stack方式启动
由于docker-compose.yml使用了一些自定义环境变量,启动服务的shell脚本如下:
export ADMIN_USER=admin
export ADMIN_PASSWORD=admin
export HASHED_PASSWORD=$(openssl passwd -apr1 $ADMIN_PASSWORD)
export DOMAIN=example.com
docker stack deploy -c docker-compose.yml mon
将以stack方式在swarm mode集群中启动所有相关服务。在/etc/hosts(windows为C:\Windows\System32\drivers\etc)添加域名和swarm_ip的映射关系后,可使用:
http://grafana.example.com,http://alertmanager.example.com,http://karma.example.com,http://prometheus.example.com 访问相应工具的UI界面。
3.3 webhook-dingtalk配置修改
由于dingtalk配置需要被修改才能正常使用,配置文件目录已持久化至宿主机,可进入容器修改 /etc/prometheus-webhook-dingtalk/config.yml文件,配置自行添加的钉钉机器人access_token以及secret:
targets:
ops:
url: https://oapi.dingtalk.com/robot/send?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# secret for signature
secret: SEC0000000000000000000000000000000000000000000000000000000000000000
dev:
url: https://oapi.dingtalk.com/robot/send?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
secret: SEC0000000000000000000000000000000000000000000000000000000000000000
发送 SIGHUP 信号量: kill -HUP <pid> (pid一般为1)即可刷新配置。
结束语
搭建整个swarm集群监控系统,顺利的话仅仅需要数分钟时间。本人也在Thinkpad X1 carbon笔记本电脑:Hyper-V运行Centos 7.9虚拟机,搭建swarm(docker版本20.10.6)单manager环境同样能够验证通过,这在相当程度上体现出了swarm mode的优势。