前言
由于业务与ELK都使用了Kafka作为消息队列,因此考虑到业务的稳定性与可用性,使用prometheus监控kafka集群。使用的监控方式为:kafka_exporter+prometheus。
提示
- 如果监控kafka集群的话,kafka_exporter只需在集群的一个节点安装部署即可
- prometheus部署在k8s之上
项目地址
https://github.com/danielqsj/kafka_exporter
下载地址
https://github.com/danielqsj/kafka_exporter/releases/download/v1.4.2/kafka_exporter-1.4.2.linux-amd64.tar.gz
一、部署kafka_exporter
[root@kafka ~]# tar xf kafka_exporter-1.4.2.linux-amd64.tar.gz -C /usr/local
[root@kafka ~]# mv /usr/local/kafka_exporter-1.4.2.linux-amd64/ /usr/local/kafka_exporter
[root@kafka ~]# useradd -s /sbin/nologin kafka
[root@kafka ~]# chown -R kafka:kafka /usr/local/kafka_exporter
[root@kafka ~]# vim /usr/lib/systemd/system/kafka_exporter.service
[Unit]
Description=kafka_exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/kafka_exporter/kafka_exporter --kafka.server=172.18.244.164:9092 --web.listen-address=:9308 --zookeeper.server=172.18.244.164:2181
[Install]
WantedBy=multi-user.target
–kafka.server=172.18.244.164:9092 #需要监控的kafka连接地址
–web.listen-address=:9308 #kafka_exporter监听地址
–zookeeper.server=172.18.244.164:2181 #需要监控的zookeeper连接地址
[root@kafka ~]# systemctl daemon-reload
[root@kafka ~]# systemctl start kafka_exporter
[root@kafka ~]# netstat -lntup | grep 9308
二、prometheus配置
[root@k8s-master ~]# vim prometh_configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-configmap
namespace: monitoring
data:
prometheus.yml: |
# my global config
global:
scrape_interval: 5s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 5s # Evaluate rules every 15 seconds. The default is every 1 minute.
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
scrape_configs:
# 监控业务kafka
- job_name: 'kafka'
static_configs:
- targets:
- 172.18.244.164:9308
[root@k8s-master ~]# kubectl apply -f prometh_configmap.yaml
三、grafana展示
官方仪表板:7589,可以导入进去使用。
但是我自己根据业务需求需要看到的数据进行了修改。以下是我自己的仪表板配置
变量名称 查询语句(PromQL)
job label_values(kafka_consumergroup_current_offset, job)
instance label_values(kafka_consumergroup_current_offset{job=~"$job"}, instance)
consumergroup label_values(kafka_consumergroup_current_offset{instance="$instance"},consumergroup)
topic label_values(kafka_consumergroup_current_offset{instance="$instance",consumergroup=~"$consumergroup"}, topic)
time 1m,2m,3m,5m,10m,30m,1h,6h,12h,1d,7d,14d,30d
图表
Kafka 运行时间:
up{instance="$instance"}
Broker 数量:
kafka_brokers{instance="$instance"}
Topic 分区数:
sum by(topic) (kafka_topic_partitions{instance="$instance",topic=~"$topic"})
每秒消费完成的次数 (CURRENT-OFFSET):
sum(rate(kafka_topic_partition_current_offset{instance="$instance", topic=~"$topic"}[$time])) by (topic)
当前队列消费堆积数量 (LAG):
sum(kafka_consumergroup_lag{instance="$instance",topic=~"$topic"}) by (consumergroup, topic)
对文章中的yaml文件与grafana仪表板json文件有兴趣的可查看我的github项目:https://github.com/shaxiaozz/prometheus