使用kube-prometheus部署k8s监控(最新版)

这里写目录标题

一、介绍

为了方便大家使用prometheus,Coreos出了提供了一个Operator,而为了方便大家一站式的监控方案就有了项目kube-prometheus是一个脚本项目,它主要使用jsonnet写成,其作用呢就是模板+参数然后渲染出yaml文件集,主要是作用是提供一个开箱即用的监控栈,用于kubernetes集群的监控和应用程序的监控。 这个项目主要包括以下软件栈

  • The Prometheus Operator:创建CRD自定义的资源对象
  • Highly available Prometheus:创建高可用的Prometheus
  • Highly available Alertmanager:创建高可用的告警组件
  • Prometheus node-exporter:创建主机的监控组件
  • Prometheus Adapter for Kubernetes Metrics APIs:创建自定义监控的指标工具(例如可以通过nginx的request来进行应用的自动伸缩)
  • kube-state-metrics:监控k8s相关资源对象的状态指标
  • Grafana:进行图像展示

Prometheus Operator的架构图

使用kube-prometheus部署k8s监控(最新版)

图片来源:
https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/Documentation/user-guides/images/architecture.png

kube-prometheus的兼容性说明https://github.com/prometheus-operator/kube-prometheus#kubernetes-compatibility-matrix),按照兼容性说明,这里部署的是release-0.8版本

kube-prometheus stack Kubernetes 1.18 Kubernetes 1.19 Kubernetes 1.20 Kubernetes 1.21 Kubernetes 1.22
release-0.6
release-0.7
release-0.8
release-0.9
HEAD

二、准备清单文件

从官方的地址获取最新的release-0.8分支,或者直接打包下载release-0.8

git clone https://github.com/prometheus-operator/kube-prometheus.git
git checkout release-0.8

# 直接下载打包好的包
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.8.0.tar.gz
tar -xvf v0.8.0.tar.gz
mv kube-prometheus-0.8.0 kube-prometheus

默认下载下来的文件较多,建议把文件进行归类处理,将相关yaml文件移动到对应目录下

cd kube-prometheus/manifests
mkdir -p serviceMonitor prometheus adapter node-exporter blackbox kube-state-metrics grafana alertmanager operator other/{nfs-storage,ingress}

最终结构如下

manifests$ tree .
.
├── adapter
│   ├── prometheus-adapter-apiService.yaml
│   ├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
│   ├── prometheus-adapter-clusterRoleBindingDelegator.yaml
│   ├── prometheus-adapter-clusterRoleBinding.yaml
│   ├── prometheus-adapter-clusterRoleServerResources.yaml
│   ├── prometheus-adapter-clusterRole.yaml
│   ├── prometheus-adapter-configMap.yaml
│   ├── prometheus-adapter-deployment.yaml
│   ├── prometheus-adapter-roleBindingAuthReader.yaml
│   ├── prometheus-adapter-serviceAccount.yaml
│   └── prometheus-adapter-service.yaml
├── alertmanager
│   ├── alertmanager-alertmanager.yaml
│   ├── alertmanager-podDisruptionBudget.yaml
│   ├── alertmanager-prometheusRule.yaml
│   ├── alertmanager-secret.yaml
│   ├── alertmanager-serviceAccount.yaml
│   ├── alertmanager-service.yaml
│   ├── kube-prometheus-prometheusRule.yaml
│   ├── kubernetes-prometheusRule.yaml
│   ├── prometheus-operator-prometheusRule.yaml
│   ├── prometheus-podDisruptionBudget.yaml
│   └── prometheus-prometheusRule.yaml
├── blackbox
│   ├── blackbox-exporter-clusterRoleBinding.yaml
│   ├── blackbox-exporter-clusterRole.yaml
│   ├── blackbox-exporter-configuration.yaml
│   ├── blackbox-exporter-deployment.yaml
│   ├── blackbox-exporter-serviceAccount.yaml
│   └── blackbox-exporter-service.yaml
├── grafana
│   ├── grafana-dashboardDatasources.yaml
│   ├── grafana-dashboardDefinitions.yaml
│   ├── grafana-dashboardSources.yaml
│   ├── grafana-deployment.yaml
│   ├── grafana-serviceAccount.yaml
│   └── grafana-service.yaml
├── kube-state-metrics
│   ├── kube-state-metrics-clusterRoleBinding.yaml
│   ├── kube-state-metrics-clusterRole.yaml
│   ├── kube-state-metrics-deployment.yaml
│   ├── kube-state-metrics-prometheusRule.yaml
│   ├── kube-state-metrics-serviceAccount.yaml
│   └── kube-state-metrics-service.yaml
├── node-exporter
│   ├── node-exporter-clusterRoleBinding.yaml
│   ├── node-exporter-clusterRole.yaml
│   ├── node-exporter-daemonset.yaml
│   ├── node-exporter-prometheusRule.yaml
│   ├── node-exporter-serviceAccount.yaml
│   └── node-exporter-service.yaml
├── other
│   ├── ingress
│   │   ├── ingress-deploy.yaml
│   │   └── prom-ingress.yaml
│   └── nfs-storage
│       ├── grafana-pvc.yaml
│       ├── nfs-provisioner.yaml
│       ├── nfs-rbac.yaml
│       └── nfs-storageclass.yaml
├── prometheus
│   ├── prometheus-clusterRoleBinding.yaml
│   ├── prometheus-clusterRole.yaml
│   ├── prometheus-prometheus.yaml
│   ├── prometheus-roleBindingConfig.yaml
│   ├── prometheus-roleBindingSpecificNamespaces.yaml
│   ├── prometheus-roleConfig.yaml
│   ├── prometheus-roleSpecificNamespaces.yaml
│   ├── prometheus-serviceAccount.yaml
│   └── prometheus-service.yaml
├── serviceMonitor
│   ├── alertmanager-serviceMonitor.yaml
│   ├── blackbox-exporter-serviceMonitor.yaml
│   ├── grafana-serviceMonitor.yaml
│   ├── kubernetes-serviceMonitorApiserver.yaml
│   ├── kubernetes-serviceMonitorCoreDNS.yaml
│   ├── kubernetes-serviceMonitorKubeControllerManager.yaml
│   ├── kubernetes-serviceMonitorKubelet.yaml
│   ├── kubernetes-serviceMonitorKubeScheduler.yaml
│   ├── kube-state-metrics-serviceMonitor.yaml
│   ├── node-exporter-serviceMonitor.yaml
│   ├── prometheus-adapter-serviceMonitor.yaml
│   ├── prometheus-operator-serviceMonitor.yaml
│   └── prometheus-serviceMonitor.yaml
└── setup
    ├── 0namespace-namespace.yaml
    ├── prometheus-operator-0alertmanagerConfigCustomResourceDefinition.yaml
    ├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
    ├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
    ├── prometheus-operator-0probeCustomResourceDefinition.yaml
    ├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
    ├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
    ├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
    ├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
    ├── prometheus-operator-clusterRoleBinding.yaml
    ├── prometheus-operator-clusterRole.yaml
    ├── prometheus-operator-deployment.yaml
    ├── prometheus-operator-serviceAccount.yaml
    └── prometheus-operator-service.yaml

12 directories, 88 files

2.1、修改yaml,增加持久化存储

manifests/prometheus/prometheus-prometheus.yaml

...
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: 2.26.0
  # 新增持久化存储,yaml 末尾添加
  retention: 3d
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: prometheus-nfs
        resources:
          requests:
            storage: 5Gi

manifests/grafana/grafana-deployment.yaml

...
132      volumes:
133      # - emptyDir: {}  注释掉这两行
134      #   name: grafana-storage
135      - name: grafana-storage   # 添加该内容
136        persistentVolumeClaim:
137          claimName: grafana-pvc
138      - name: grafana-datasources # 添加上面的,这个不变
139        secret:
140          secretName: grafana-datasources

2.2、准备 持久化配置(nfs-storage)

需要提前安装nfs-server 服务端

manifests/other/nfs-storage/nfs-rbac.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-provisioner
  namespace: monitoring
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
   name: nfs-provisioner-runner
   namespace: monitoring
rules:
   -  apiGroups: [""]
      resources: ["persistentvolumes"]
      verbs: ["get", "list", "watch", "create", "delete"]
   -  apiGroups: [""]
      resources: ["persistentvolumeclaims"]
      verbs: ["get", "list", "watch", "update"]
   -  apiGroups: ["storage.k8s.io"]
      resources: ["storageclasses"]
      verbs: ["get", "list", "watch"]
   -  apiGroups: [""]
      resources: ["events"]
      verbs: ["watch", "create", "update", "patch"]
   -  apiGroups: [""]
      resources: ["services", "endpoints"]
      verbs: ["get","create","list", "watch","update"]
   -  apiGroups: ["extensions"]
      resources: ["podsecuritypolicies"]
      resourceNames: ["nfs-provisioner"]
      verbs: ["use"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-provisioner
    namespace: monitoring
roleRef:
  kind: ClusterRole
  name: nfs-provisioner-runner
  apiGroup: rbac.authorization.k8s.io

manifests/other/nfs-storage/nfs-provisioner.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nfs-client-provisioner
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccount: nfs-provisioner
      containers:
        - name: nfs-client-provisioner
          #image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0
          image: registry.cn-beijing.aliyuncs.com/mydlq/nfs-subdir-external-provisioner:v4.0.0
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfs-client-root
              mountPath:  /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: nfs-client
            - name: NFS_SERVER
              value: 192.168.50.134
            - name: NFS_PATH
              value: /data/nfs/prometheus
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.50.134
            path: /data/nfs/prometheus

manifests/other/nfs-storage/nfs-storageclass.yaml

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-nfs
  namespace: monitoring
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"  ## 是否设置为默认的storageclass
provisioner: nfs-client                                   ## 动态卷分配者名称,必须和上面创建的"provisioner"变量中设置的Name一致
reclaimPolicy: Retain                                     ## 指定回收政策,storage 默认为 Delete

这里需要创建grafana的 pvc 并指向我们创建的storage ,它会自动生成pv

manifests/other/nfs-storage/grafana-pvc.yaml

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: grafana-pvc
  namespace: monitoring
spec:
  storageClassName: "prometheus-nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi

三、启动服务

kubectl apply -f  setup/
# 创建 nfs-storage 和grafana-pvc
kubectl apply -f  other/nfs-storage/
# 启动服务
kubectl apply  -f adapter/  -f alertmanager/  -f blackbox/  grafana/  -f kube-state-metrics/ -f node-exporter/ -f prometheus/ -f serviceMonitor/

启动完如下图:

使用kube-prometheus部署k8s监控(最新版)

3.1、安装 ingress-controller

准备 ingress yaml

wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.1/deploy/static/provider/cloud/deploy.yaml -O ingress-deploy.yaml

manifests/other/ingress/ingress-deploy.yaml

修改 ingress yaml ,下面贴出需要修改的地方

---
apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/instance: ingress-nginx

---
# 添加 ingress的 日志持久化 pvc
# add ingress logs volume-pvc
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ingress-logs-pvc
  namespace: ingress-nginx
spec:
  # 指向我们创建的 nfs-storage
  storageClassName: prometheus-nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
---
...
---
# Source: ingress-nginx/templates/controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    helm.sh/chart: ingress-nginx-3.30.0
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/version: 0.46.0
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: controller
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  # 设置configmap  定义日志输出目录
  access-log-path: "/var/log/nginx/access.log"
  error-log-path: "/var/log/nginx/error.log"

...
          args:
            - /nginx-ingress-controller
            - --election-id=ingress-controller-leader
            - --ingress-class=nginx
            - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
            - --validating-webhook=:8443
            - --validating-webhook-certificate=/usr/local/certificates/cert
            - --validating-webhook-key=/usr/local/certificates/key
            - --v=2
            # 添加下面2两行,日志输出路径
            - --log_dir=/var/log/nginx/
            - --logtostderr=false
...
          volumeMounts:
            - name: webhook-cert
              mountPath: /usr/local/certificates/
              readOnly: true
            # 添加挂载目录
            - name: ingress-logs
              mountPath: /var/log/nginx/
          resources:
            requests:
              cpu: 100m
              memory: 90Mi
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: ingress-nginx
      terminationGracePeriodSeconds: 300
      volumes:
        - name: webhook-cert
          secret:
            secretName: ingress-nginx-admission
        #持久化日志
        - name: ingress-logs
          persistentVolumeClaim:
            claimName: ingress-logs-pvc
...

3.2、编写ingress 代理规则

manifests/other/ingress/prom-ingress.yaml

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prom-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
    prometheus.io/http_probe: "true"
spec:
  rules:
  - host: alert.k8s.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: alertmanager-main
            port:
              number: 9093
  - host: grafana.k8s.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grafana
            port:
              number: 3000
  - host: prom.k8s.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-k8s
            port:
              number: 9090

本地hosts 添加解析

cat >> /etc/hosts <<EOF
192.168.50.134 alert.k8s.com grafana.k8s.com prom.k8s.com
EOF

四、浏览器访问

4.1、promethues

http://prom.k8s.com

使用kube-prometheus部署k8s监控(最新版)

4.2、grafana 添加 kubernetes 监控模版

模版ID: 13105https://grafana.com/grafana/dashboards/13105

http://grafana.k8s.com
效果如下图:
使用kube-prometheus部署k8s监控(最新版)

到此 kubernetes 部署 prometheus 监控 完毕!

参考文档

kube-prometheus官方地址:https://github.com/prometheus-operator/kube-prometheus

上一篇:prometheus 监控的部署


下一篇:【Prometheus】从入门到放弃-基础安装-Prometheus Server