【prometheus系列】13、Prometheus监控docker容器

## Prometheus监控docker容器 cAdvisor 将容器统计信息公开为 Prometheus 指标。默认情况下, 这些指标在/metrics HTTP 端点下提供。可以通过设置-prometheus_endpoint 命令行标志来自定义此端点。 要使用 Prometheus 监控 cAdvisor, 只需在 Prometheus 中配置一个或多个作业, 这些作业会 在该指标端点处刮取相关的 cAdvisor 流程. ### 1 启动cAdvisor容器 ```bash docker run \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:ro \--volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --volume=/dev/disk/:/dev/disk:ro \ --publish=8080:8080 \ --detach=true \ --name=cadvisor \ google/cadvisor:latest ``` ![image.png](http://www.icode9.com/i/li/?n=2&i=images/20210711/1626013955937011.png?,size_14,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_20,type_ZmFuZ3poZW5naGVpdGk=) 验证采集的指标 ```bash curl http://172.16.0.8:8080/metrics ``` 常用的一些容器监控项(根据cadvisor获取的指标promQL计算): 容器 CPU 使用率: sum(irate(container_cpu_usage_seconds_total{image!=""}[1m])) without (cpu) 查询容器内存使用量(单位: 字节) : container_memory_usage_bytes{image!=""} 查询容器网络接收量速率(单位: 字节/秒) :sum(rate(container_network_receive_bytes_total{image!=""}[1m])) without (interface) 查询容器网络传输量速率(单位: 字节/秒) : sum(rate(container_network_transmit_bytes_total{image!=""}[1m])) without (interface) 查询容器文件系统读取速率(单位: 字节/秒) : sum(rate(container_fs_reads_bytes_total{image!=""}[1m])) without (device) 查询容器文件系统写入速率(单位: 字节/秒) : sum(rate(container_fs_writes_bytes_total{image!=""}[1m])) without (device) ### 2 prometheus增加docker监控 ```yml - job_name: 'docker' static_configs: - targets: ['172.16.0.8:8080'] ``` 重启prometheus服务 ### 3 Grafana dashboard 导入docker监控模板,dashboard Id:`193` ### 4 添加告警规则 ```yml - alert: ContainerKilled expr: time() - container_last_seen > 60 for: 0m labels: severity: warning annotations: summary: Container killed (instance {{ $labels.instance }}) description: "A container has disappeared\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" # cAdvisor can sometimes consume a lot of CPU, so this alert will fire constantly. # If you want to exclude it from this alert, exclude the serie having an empty name: container_cpu_usage_seconds_total{name!=""} - alert: ContainerCpuUsage expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80 for: 2m labels: severity: warning annotations: summary: Container CPU usage (instance {{ $labels.instance }}) description: "Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" # See https://medium.com/faun/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d - alert: ContainerMemoryUsage expr: (sum(container_memory_working_set_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes > 0) BY (instance, name) * 100) > 80 for: 2m labels: severity: warning annotations: summary: Container Memory usage (instance {{ $labels.instance }}) description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: ContainerVolumeUsage expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance))) * 100 > 80 for: 2m labels: severity: warning annotations: summary: Container Volume usage (instance {{ $labels.instance }}) description: "Container Volume usage is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: ContainerVolumeIoUsage expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80 for: 2m labels: severity: warning annotations: summary: Container Volume IO usage (instance {{ $labels.instance }}) description: "Container Volume IO usage is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: ContainerHighThrottleRate expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1 for: 2m labels: severity: warning annotations: summary: Container high throttle rate (instance {{ $labels.instance }}) description: "Container is being throttled\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" ```
上一篇:C#dataGridView添加自增列


下一篇:Android技术分享| 【Android 自定义View】多人视频通话控件