参考:https://blog.51cto.com/flyfish225/2554294
参考:https://blog.csdn.net/qq_31555951/article/details/110666480
Prometheus 部署
wget https://github.com/prometheus/prometheus/releases/download/v2.23.0/prometheus-2.23.0.linux-amd64.tar.gz
tar xf prometheus-2.23.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local
mv prometheus-2.23.0.linux-amd64/ prometheus
cd prometheus/
vim prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['192.168.31.10:9090']
- job_name: 'node'
static_configs:
- targets: ['192.168.31.10:9100']
labels:
app: master01
nodename: k8s-master01
role: master
- targets: ['192.168.31.5:9100']
labels:
app: master01
nodename: test1
role: master
启动:
nohup ./ prometheus &
node_exporter 部署
wget https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
tar xf node_exporter-1.0.1.linux-amd64.tar.gz -C /usr/local/
mv node_exporter-1.0.1.linux-amd64 node_exporter
启动:
nohup ./node_exporter &
node是否存活
up{nodename="test1"}
磁盘使用率公式
监控 / 使用率
round((1 - (node_filesystem_avail_bytes{fstype=~"ext3|ext4|xfs|nfs", nodename="test1",mountpoint="/"} / node_filesystem_size_bytes{fstype=~"ext3|ext4|xfs|nfs", nodename="test1",mountpoint="/"})) * 100)
监控磁盘类型是ext4和xfs和NFS的使用率
round((1 - (node_filesystem_avail_bytes{fstype=~"ext4|xfs|nfs", nodename="test1"} / node_filesystem_size_bytes{fstype=~"ext4|xfs|nfs", nodename="test1"})) * 100)
cpu 负载
node_load1{nodename="test1"} #1分钟负载
node_load5{nodename="test1"} #5分 钟负载
node_load15 {nodename="test1"} #15分钟负载
内存使用率
ceil((1 - (node_memory_MemAvailable_bytes{nodename="test1"} / (node_memory_MemTotal_bytes{nodename="test1"})))* 100 )
CPU
ceil(100 - sum(increase(node_cpu_seconds_total{nodename="test1",mode="idle"}[5m])) by(instance) / sum(increase(node_cpu_seconds_total{nodename="test1"}[5m])) by(instance)*100)
查看打开文件数
node_filefd_allocated{nodename="test1"}
监控tcp链接等待关闭的链接
node_sockstat_TCP_tw