prometheus监控swarm mode集群实战

目录

1. cadvisor部署

2. prometheus配置

2.1 prometheus.yml配置

2.2 告警规则配置

swarm节点告警规则

swarm task告警规则

3. prometheus及相关工具部署

3.1 docker-compose.yml配置

3.2 docker stack方式启动

3.3 webhook-dingtalk配置修改

结束语


Docker官方的容器编排方案swarm mode,在k8s流行的今天有越来越边缘化的趋势,但是swarm mode部署和运维简单,所需的硬件资源要求也小得多,非常适合部署中小型分布式微服务架构的系统。

首先说明一下本文的验证环境:

操作系统:Redhat EL7.6

swarm集群信息:3个manager,22个worker

所用工具清单:

名称 版本 说明
docker 19.03.13 20+同样适用
prometheus 2.26.0 该版本官方支持swarm mode服务发现,本文还使用了node-exporter、alertmanager组件
cadvisor 0.37.5 可直接部署至宿主机,使用systemctl服务管理
grafana 7.5.5 图形化监控信息展示
karma 0.85 prometheus告警仪表板,展示所有告警信息
prometheus-webhook-dingtalk 1.4.0

推送钉钉通知

traefik 2.4.8 反向代理grafana,prometheus,karma等ui服务

本文所涉及的配置文件可从 https://github.com/cjie001/swarm-monitoring 获取。

1. cadvisor部署

按照目前网上的资料,最新版本的cadvisor以docker stack方式难以部署成功,会遇到 "open /dev/kmsg: no such file or directory" 错误,原因是stack方式部署到swarm集群时,docker-compose.yml的privileged参数设置失效,仅适用于单机docker-compose up模式或docker run方式启动cadvisor,但这样无法利用docker stack的优势,可以用global模式将容器部署到所有docker主机。后从github上看到一个巧妙的解决办法:

......

  cadvisor:
    image: docker:19.0.3
    volumes:
        - /var/run/docker.sock:/var/run/docker.sock
    entrypoint: ["/bin/sh","-c"]
    networks:
      - cnet
    deploy:
      mode: global
    environment:
      - TZ=Asia/Shanghai
      - PARENT={{.Task.Name}}
      - CHILDNAME={{.Service.Name}}_sidecar.{{.Node.ID}}.{{.Task.ID}}
      - CADVISOR_VERSION=v0.37.5
    command:
    - |
      exec docker run -i --rm --network="container:$${PARENT}" \
            --env=TZ=Asia/Shanghai \
            --volume=/:/rootfs:ro \
            --volume=/var/run:/var/run:ro  \
            --volume=/var/run/docker.sock:/var/run/docker.sock:ro \
            --volume=/sys:/sys:ro  \
            --volume=/var/lib/docker/:/var/lib/docker:ro \
            --volume=/dev/disk/:/dev/disk:ro \
            --name $${CHILDNAME} \
            --privileged \
            --device=/dev/kmsg \
            cjie001/cadvisor:0.37.5 -docker_only

实际上是以docker stack方式启动docker官方的docker:19.0.13镜像,在其容器内再执行docker run命令来绕过 privileged 参数失效的问题,同时又能够方便地部署到各个节点上。只是采用这种方式,每台主机需要多运行一个docker容器,略占资源。

考虑到cadvisor是go语言编写,仅一个可执行文件,也可以直接将cadvisor部署到宿主机:

https://github.com/google/cadvisor/releases/download/v0.37.5/cadvisor 下载cadvisor,放到Redhat/Centos下的/usr/bin目录即可,并编写了一个cadvisor.service文件,用systemctl来管理服务,只是一次性地部署工作,也还算方便。

# cd /usr/bin
# wget https://github.com/google/cadvisor/releases/download/v0.37.5/cadvisor

cadvisor.service示例:

[Unit]
Description=Container Monitoring Agent
After=docker.service
Requires=docker.service

[Service]
EnvironmentFile=/etc/environment
TimeoutStartSec=0
ExecStart=/usr/bin/cadvisor --port=18080

KillMode=process
OOMScoreAdjust=-500

TimeoutSec=0
RestartSec=2
Restart=always

对于Redhat/Centos系统,将cadvisor.service文件放到/usr/lib/systemd/system/目录下,使用systemctl enable cadvisor命令启用自启动模式,systemctl start cadvisor手动启动cadvisor服务。

2. prometheus配置

prometheus的配置分两部分介绍,一是prometheus的配置文件说明,二是报警规则的配置示例。

2.1 prometheus.yml配置

目前prometheus官方服务发现组件已经支持swarm,docker metrics监控示例(需要在/etc/docker/daemon.json中启用metrics-addr和experimental参数):

  # Create a job for Docker daemons.
  #
  # This exemple requires Docker daemons to be configured to expose
  # Prometheus metrics, as documented here:
  # https://docs.docker.com/config/daemon/prometheus/
  - job_name: 'docker'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
        role: nodes
    relabel_configs:
      # Fetch metrics on port 9323.
      - source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: $1:9323

但这里会有个问题,就是当宿主机是leader的时候,relabel_configs后的IP是0.0.0.0,这就导致leader节点的metrics监控地址是错误的。研究了一下relabel_configs,对__meta_dockerswarm_node_manager_address进行正则匹配(见下方regex配置),取出管理节点的IP,最终采用如下配置可解决问题:

  # Create a job for Docker daemons.
  - job_name: 'docker'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock
        role: nodes
    relabel_configs:
      # Fetch metrics on port 9323.
      - source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: $1:9323
      - source_labels: [__meta_dockerswarm_node_manager_address]
        regex: ([^:]+):\d+
        target_label: __address__
        replacement: $1:9323
      # Set hostname as instance label
      - source_labels: [__meta_dockerswarm_node_hostname]
        target_label: instance

完整prometheus配置:

global:
  scrape_interval:     15s
  evaluation_interval: 15s

  external_labels:
    monitor: 'swarm监控'

rule_files:
  - "swarm_node.rules.yml"
  - "swarm_task.rules.yml"

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Docker daemons监控
  - job_name: 'docker'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
        role: nodes
    relabel_configs:
      # Fetch metrics on port 9323.
      - source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: $1:9323
      - source_labels: [__meta_dockerswarm_node_manager_address]
        regex: ([^:]+):\d+
        target_label: __address__
        replacement: $1:9323
      # Set hostname as instance label
      - source_labels: [__meta_dockerswarm_node_hostname]
        target_label: instance

  # Docker Swarm容器监控
  # cadvisor直接部署到宿主机使用以下配置
  #- job_name: 'cadvisor'
  #  dockerswarm_sd_configs:
  #    - host: unix:///var/run/docker.sock
  #      role: nodes
  #  relabel_configs:
  #    # Fetch metrics on port 18080.
  #    - source_labels: [__meta_dockerswarm_node_address]
  #      target_label: __address__
  #      replacement: $1:18080
  #    - source_labels: [__meta_dockerswarm_node_manager_address]
  #      regex: ([^:]+):\d+
  #      target_label: __address__
  #      replacement: $1:18080
  #    # Set hostname as instance label
  #    - source_labels: [__meta_dockerswarm_node_hostname]
  #      target_label: instance
  # cadvisor使用容器化部署使用以下配置
  - job_name: 'cadvisor'
    dns_sd_configs:
    - names:
      - 'tasks.cadvisor'
      type: 'A'
      port: 8080

  # Docker swarm主机节点监控(需要global方式部署node-exporter容器)
  - job_name: 'node-exporter'
    dns_sd_configs:
    - names:
      - 'tasks.node-exporter'
      type: 'A'
      port: 9100

2.2 告警规则配置

本例使用了swarm node(主机)和task(service)两个维度的告警规则,具体内容如下:

swarm节点告警规则

swarm_node.rules.xml

- name: swarm_node.rules
  rules:
  - alert: node_cpu_usage
    expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[1m]) * ON(instance) GROUP_LEFT(node_name)
      node_meta * 100) BY (node_name)) > 50
    for: 1m
    labels:
      severity: warning
    annotations:
      description: Swarm 节点 {{ $labels.node_name }} CPU 使用率 {{ humanize $value}}%
      summary: CPU 告警 '{{ $labels.node_name }}'
  - alert: node_memory_usage
    expr: sum(((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes)
      * ON(instance) GROUP_LEFT(node_name) node_meta * 100) BY (node_name) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      description: Swarm 节点 {{ $labels.node_name }} 内存使用率 {{ humanize $value}}%
      summary: Memory 告警 '{{ $labels.node_name }}'
  - alert: node_disk_usage
    expr: ((node_filesystem_size_bytes{mountpoint="/rootfs"} - node_filesystem_free_bytes{mountpoint="/rootfs"})
      * 100 / node_filesystem_size_bytes{mountpoint="/rootfs"}) * ON(instance) GROUP_LEFT(node_name)
      node_meta > 85
    for: 1m
    labels:
      severity: warning
    annotations:
      description: Swarm 节点 {{ $labels.node_name }} 磁盘使用率 {{ humanize $value}}%
      summary: Disk 告警 '{{ $labels.node_name }}'
  - alert: node_disk_fill_rate_6h
    expr: predict_linear(node_filesystem_free_bytes{mountpoint="/rootfs"}[1h], 6 * 3600) * ON(instance)
      GROUP_LEFT(node_name) node_meta < 0
    for: 1h
    labels:
      severity: critical
    annotations:
      description: Swarm 节点 {{ $labels.node_name }} 6小时后磁盘将满
      summary: 磁盘空间不足告警 '{{ $labels.node_name }}'

swarm task告警规则

swarm_task.rules.yml

groups:
- name: swarm_task.rules
  rules:
  - alert: task_high_cpu_usage_50 # 容器CPU使用率超50%
    expr: sum(rate(container_cpu_usage_seconds_total{container_label_com_docker_swarm_task_name=~".+"}[1m]))
      BY (container_label_com_docker_swarm_task_name, container_label_com_docker_swarm_node_id)
      * 100 > 50
    for: 1m
    labels:
      severity: critical
    annotations:
      description: '{{ $labels.container_label_com_docker_swarm_task_name }} on ''{{
        $labels.container_label_com_docker_swarm_node_id }}'' CPU 使用率 {{ humanize
        $value}}%.'
      summary: CPU 告警 '{{ $labels.container_label_com_docker_swarm_task_name
        }}' on '{{ $labels.container_label_com_docker_swarm_node_id }}'
  - alert: task_high_memory_usage_1g #容器内存使用率超1g
    expr: sum(container_memory_rss{container_label_com_docker_swarm_task_name=~".+"})
      BY (container_label_com_docker_swarm_task_name, container_label_com_docker_swarm_node_id) > 1e+09
    for: 1m
    labels:
      severity: critical
    annotations:
      description: '{{ $labels.container_label_com_docker_swarm_task_name }} on ''{{
        $labels.container_label_com_docker_swarm_node_id }}'' 占用内存 {{ humanize
        $value}}.'
      summary: 内存告警 '{{ $labels.container_label_com_docker_swarm_task_name
        }}' on '{{ $labels.container_label_com_docker_swarm_node_id }}'

  - alert: service_group_warnning #指定stack中服务数量监控(最小数量确定, 示例为8)
    expr: count(rate(container_memory_usage_bytes{container_label_com_docker_stack_namespace=~"my_stack"}[1m])) < 8
    for: 1m
    labels:
      severity: critical
    annotations:
      description: 'my_stack 容器数量不足: {{ $value }} / 8.'
      summary: '服务数量不符合预期'

  - alert: services_down #指定服务名监控
    expr: absent(count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name=~"my_service"}))
    for: 30s
    labels:
      severity: critical
      team: dev
    annotations:
      description: '服务 my_service 停止超过 30 秒'
      summary: '服务无可用实例'

3. prometheus及相关工具部署

本例采用docker stack方式部署prometheus、cadvisor、granafa、karma、webhook-dingtalk、traefik等工具。

3.1 docker-compose.yml配置

version: "3.8"

  cadvisor:
    image: docker:19.03.13
    volumes:
        - /var/run/docker.sock:/var/run/docker.sock
    entrypoint: ["/bin/sh","-c"]
    networks:
      - cnet
    deploy:
      mode: global
    environment:
      - TZ=Asia/Shanghai
      - PARENT={{.Task.Name}}
      - CHILDNAME={{.Service.Name}}_sidecar.{{.Node.ID}}.{{.Task.ID}}
      - CADVISOR_VERSION=v0.37.5
    command:
    - |
      exec docker run -i --rm --network="container:$${PARENT}" \
            --env=TZ=Asia/Shanghai \
            --volume=/:/rootfs:ro \
            --volume=/var/run:/var/run:ro  \
            --volume=/var/run/docker.sock:/var/run/docker.sock:ro \
            --volume=/sys:/sys:ro  \
            --volume=/var/lib/docker/:/var/lib/docker:ro \
            --volume=/dev/disk/:/dev/disk:ro \
            --name $${CHILDNAME} \
            --privileged \
            --device=/dev/kmsg \
            cjie001/cadvisor:0.37.5 -docker_only

  grafana:
    image: cjie001/grafana:7.5.5
    networks:
      - cnet
    environment:
      - TZ=Asia/Shanghai
      - GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
      - GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
      - GF_USERS_ALLOW_SIGN_UP=false
      #- GF_SERVER_ROOT_URL=${GF_SERVER_ROOT_URL:-localhost}
      #- GF_SMTP_ENABLED=${GF_SMTP_ENABLED:-false}
      #- GF_SMTP_FROM_ADDRESS=${GF_SMTP_FROM_ADDRESS:-grafana@test.com}
      #- GF_SMTP_FROM_NAME=${GF_SMTP_FROM_NAME:-Grafana}
      #- GF_SMTP_HOST=${GF_SMTP_HOST:-smtp:25}
      #- GF_SMTP_USER=${GF_SMTP_USER}
      #- GF_SMTP_PASSWORD=${GF_SMTP_PASSWORD}
    volumes:
      - grafana:/var/lib/grafana
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M
      labels:
        - traefik.enable=true
        - traefik.docker.network=cnet
        - traefik.http.routers.grafana-http.rule=Host(`grafana.${DOMAIN?Variable DOMAIN not set}`)
        - traefik.http.routers.grafana-http.entrypoints=http
        - traefik.http.services.grafana.loadbalancer.server.port=3000

  dingtalk:
    image: cjie001/prometheus-webhook-dingtalk:latest
    #ports:
    #  - "8060:8060"
    environment:
      TZ: 'Asia/Shanghai'
    command:
      - '--config.file=/etc/prometheus-webhook-dingtalk/config.yml'
      - '--web.enable-ui'
      - '--web.enable-lifecycle'
    networks:
      - cnet
    volumes:
      - dingtalk_etc:/etc/prometheus-webhook-dingtalk
    deploy:
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M
      labels:
        - traefik.enable=true
        - traefik.docker.network=cnet
        - traefik.http.routers.webhook-dingtalk-http.rule=Host(`dingtalk.${DOMAIN?Variable DOMAIN not set}`)
        - traefik.http.routers.webhook-dingtalk-http.entrypoints=http
        - traefik.http.middlewares.webhook-dingtalk-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
        - traefik.http.routers.webhook-dingtalk-http.middlewares=webhook-dingtalk-auth
        - traefik.http.services.webhook-dingtalk.loadbalancer.server.port=8060

  alertmanager:
    image: cjie001/alertmanager:v0.21.0
    networks:
      - cnet
    environment:
      TZ: 'Asia/Shanghai'
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    deploy:
      placement:
        constraints:
          - node.role == manager
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M
      labels:
        - traefik.enable=true
        - traefik.docker.network=cnet
        - traefik.http.routers.alertmanager-http.rule=Host(`alertmanager.${DOMAIN?Variable DOMAIN not set}`)
        - traefik.http.routers.alertmanager-http.entrypoints=http
        - traefik.http.routers.alertmanager-http.middlewares=alertmanager-auth
        - traefik.http.middlewares.alertmanager-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
        - traefik.http.services.alertmanager.loadbalancer.server.port=9093

  karma:
    image: lmierzwa/karma:v0.85
    networks:
      - cnet
    #ports:
    #- "8061:8080"
    environment:
      - TZ=Asia/Shanghai
      - ALERTMANAGER_URI=http://alertmanager:9093
    deploy:
      mode: replicated
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M
      replicas: 1
      labels:
        - traefik.enable=true
        - traefik.docker.network=cnet
        - traefik.http.routers.karma-http.rule=Host(`karma.${DOMAIN?Variable DOMAIN not set}`)
        - traefik.http.routers.karma-http.entrypoints=http
        - traefik.http.middlewares.karma-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
        - traefik.http.routers.karma-http.middlewares=karma-auth
        - traefik.http.services.karma.loadbalancer.server.port=8080

  node-exporter:
    image: cjie001/node-exporter:v1.1.2
    networks:
      - cnet
    environment:
      - TZ=Asia/Shanghai
      - NODE_ID={{.Node.ID}}
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
      - /etc/hostname:/etc/nodename
    command:
      - '--path.sysfs=/host/sys'
      - '--path.procfs=/host/proc'
      - '--collector.textfile.directory=/etc/node-exporter/'
      - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
      - '--no-collector.ipvs'
    deploy:
      mode: global
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M

  prometheus:
    image: cjie001/prometheus:v2.26.0
    #ports:
    #- "9090:9090"
    networks:
      - cnet
    environment:
      - TZ=Asia/Shanghai
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention=${PROMETHEUS_RETENTION:-48h}'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    volumes:
      - prometheus_data:/prometheus
      - prometheus_etc:/etc/prometheus
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      placement:
        constraints:
          - node.role == manager
      resources:
        limits:
          memory: 2048M
        reservations:
          memory: 128M
      labels:
        - traefik.enable=true
        - traefik.docker.network=cnet
        - traefik.http.routers.prometheus-http.rule=Host(`prometheus.${DOMAIN?Variable DOMAIN not set}`)
        - traefik.http.routers.prometheus-http.entrypoints=http
        - traefik.http.middlewares.prometheus-auth.basicauth.users=${ADMIN_USER?Variable ADMIN_USER not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
        - traefik.http.routers.prometheus-http.middlewares=prometheus-auth
        - traefik.http.services.prometheus.loadbalancer.server.port=9090

  traefik:
    image: traefik:latest
    ports:
      - 80:80
      - 8080:8080
      #- 443:443
    networks:
      - cnet
    command:
      - "--global.checknewversion=false"
      - "--entrypoints.http.address=:80"
      - "--entrypoints.testapi.address=:3308"
      - "--api=true"
      - "--api.insecure=true"
      - "--api.dashboard=true"
      - "--api.debug=false"
      - "--ping=true"
      - "--log.level=info"
      - "--log.format=common"
      - "--accesslog=true"
      - "--providers.docker=true"
      - "--providers.docker.watch=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.network=cnet"
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=cnet"
    healthcheck:
      test: ["CMD-SHELL", "wget -q --spider --proxy off localhost:8080/ping || exit 1"]
      interval: 10s
      retries: 3

volumes:
    prometheus_data:
    prometheus_etc:
    dingtalk_etc:
    grafana:

networks:
  cnet:

3.2 docker stack方式启动

由于docker-compose.yml使用了一些自定义环境变量,启动服务的shell脚本如下:

export ADMIN_USER=admin
export ADMIN_PASSWORD=admin
export HASHED_PASSWORD=$(openssl passwd -apr1 $ADMIN_PASSWORD)
export DOMAIN=example.com
docker stack deploy -c docker-compose.yml mon

将以stack方式在swarm mode集群中启动所有相关服务。在/etc/hosts(windows为C:\Windows\System32\drivers\etc)添加域名和swarm_ip的映射关系后,可使用:

http://grafana.example.comhttp://alertmanager.example.comhttp://karma.example.comhttp://prometheus.example.com 访问相应工具的UI界面。

3.3 webhook-dingtalk配置修改

由于dingtalk配置需要被修改才能正常使用,配置文件目录已持久化至宿主机,可进入容器修改 /etc/prometheus-webhook-dingtalk/config.yml文件,配置自行添加的钉钉机器人access_token以及secret:

targets:
  ops:
    url: https://oapi.dingtalk.com/robot/send?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    # secret for signature
    secret: SEC0000000000000000000000000000000000000000000000000000000000000000
  dev:
    url: https://oapi.dingtalk.com/robot/send?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    secret: SEC0000000000000000000000000000000000000000000000000000000000000000

发送 SIGHUP 信号量: kill -HUP <pid> (pid一般为1)即可刷新配置。

结束语

搭建整个swarm集群监控系统,顺利的话仅仅需要数分钟时间。本人也在Thinkpad X1 carbon笔记本电脑:Hyper-V运行Centos 7.9虚拟机,搭建swarm(docker版本20.10.6)单manager环境同样能够验证通过,这在相当程度上体现出了swarm mode的优势。

上一篇:Docker,Docker Compose,Docker Swarm,Kubernetes之间的区别


下一篇:Docker Swarm 横向扩容/收缩简单使用