这里写目录标题
一、介绍
为了方便大家使用prometheus,Coreos出了提供了一个Operator,而为了方便大家一站式的监控方案就有了项目kube-prometheus是一个脚本项目,它主要使用jsonnet写成,其作用呢就是模板+参数然后渲染出yaml文件集,主要是作用是提供一个开箱即用的监控栈,用于kubernetes集群的监控和应用程序的监控。 这个项目主要包括以下软件栈
- The Prometheus Operator:创建CRD自定义的资源对象
- Highly available Prometheus:创建高可用的Prometheus
- Highly available Alertmanager:创建高可用的告警组件
- Prometheus node-exporter:创建主机的监控组件
- Prometheus Adapter for Kubernetes Metrics APIs:创建自定义监控的指标工具(例如可以通过nginx的request来进行应用的自动伸缩)
- kube-state-metrics:监控k8s相关资源对象的状态指标
- Grafana:进行图像展示
Prometheus Operator的架构图
图片来源:
https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/Documentation/user-guides/images/architecture.png
kube-prometheus的兼容性说明(https://github.com/prometheus-operator/kube-prometheus#kubernetes-compatibility-matrix),按照兼容性说明,这里部署的是release-0.8版本
kube-prometheus stack | Kubernetes 1.18 | Kubernetes 1.19 | Kubernetes 1.20 | Kubernetes 1.21 | Kubernetes 1.22 |
---|---|---|---|---|---|
release-0.6 |
✗ | ✔ | ✗ | ✗ | ✗ |
release-0.7 |
✗ | ✔ | ✔ | ✗ | ✗ |
release-0.8 |
✗ | ✗ | ✔ | ✔ | ✗ |
release-0.9 |
✗ | ✗ | ✗ | ✔ | ✔ |
HEAD |
✗ | ✗ | ✗ | ✔ | ✔ |
二、准备清单文件
从官方的地址获取最新的release-0.8分支,或者直接打包下载release-0.8
git clone https://github.com/prometheus-operator/kube-prometheus.git
git checkout release-0.8
# 直接下载打包好的包
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.8.0.tar.gz
tar -xvf v0.8.0.tar.gz
mv kube-prometheus-0.8.0 kube-prometheus
默认下载下来的文件较多,建议把文件进行归类处理,将相关yaml
文件移动到对应目录下
cd kube-prometheus/manifests
mkdir -p serviceMonitor prometheus adapter node-exporter blackbox kube-state-metrics grafana alertmanager operator other/{nfs-storage,ingress}
最终结构如下
manifests$ tree .
.
├── adapter
│ ├── prometheus-adapter-apiService.yaml
│ ├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
│ ├── prometheus-adapter-clusterRoleBindingDelegator.yaml
│ ├── prometheus-adapter-clusterRoleBinding.yaml
│ ├── prometheus-adapter-clusterRoleServerResources.yaml
│ ├── prometheus-adapter-clusterRole.yaml
│ ├── prometheus-adapter-configMap.yaml
│ ├── prometheus-adapter-deployment.yaml
│ ├── prometheus-adapter-roleBindingAuthReader.yaml
│ ├── prometheus-adapter-serviceAccount.yaml
│ └── prometheus-adapter-service.yaml
├── alertmanager
│ ├── alertmanager-alertmanager.yaml
│ ├── alertmanager-podDisruptionBudget.yaml
│ ├── alertmanager-prometheusRule.yaml
│ ├── alertmanager-secret.yaml
│ ├── alertmanager-serviceAccount.yaml
│ ├── alertmanager-service.yaml
│ ├── kube-prometheus-prometheusRule.yaml
│ ├── kubernetes-prometheusRule.yaml
│ ├── prometheus-operator-prometheusRule.yaml
│ ├── prometheus-podDisruptionBudget.yaml
│ └── prometheus-prometheusRule.yaml
├── blackbox
│ ├── blackbox-exporter-clusterRoleBinding.yaml
│ ├── blackbox-exporter-clusterRole.yaml
│ ├── blackbox-exporter-configuration.yaml
│ ├── blackbox-exporter-deployment.yaml
│ ├── blackbox-exporter-serviceAccount.yaml
│ └── blackbox-exporter-service.yaml
├── grafana
│ ├── grafana-dashboardDatasources.yaml
│ ├── grafana-dashboardDefinitions.yaml
│ ├── grafana-dashboardSources.yaml
│ ├── grafana-deployment.yaml
│ ├── grafana-serviceAccount.yaml
│ └── grafana-service.yaml
├── kube-state-metrics
│ ├── kube-state-metrics-clusterRoleBinding.yaml
│ ├── kube-state-metrics-clusterRole.yaml
│ ├── kube-state-metrics-deployment.yaml
│ ├── kube-state-metrics-prometheusRule.yaml
│ ├── kube-state-metrics-serviceAccount.yaml
│ └── kube-state-metrics-service.yaml
├── node-exporter
│ ├── node-exporter-clusterRoleBinding.yaml
│ ├── node-exporter-clusterRole.yaml
│ ├── node-exporter-daemonset.yaml
│ ├── node-exporter-prometheusRule.yaml
│ ├── node-exporter-serviceAccount.yaml
│ └── node-exporter-service.yaml
├── other
│ ├── ingress
│ │ ├── ingress-deploy.yaml
│ │ └── prom-ingress.yaml
│ └── nfs-storage
│ ├── grafana-pvc.yaml
│ ├── nfs-provisioner.yaml
│ ├── nfs-rbac.yaml
│ └── nfs-storageclass.yaml
├── prometheus
│ ├── prometheus-clusterRoleBinding.yaml
│ ├── prometheus-clusterRole.yaml
│ ├── prometheus-prometheus.yaml
│ ├── prometheus-roleBindingConfig.yaml
│ ├── prometheus-roleBindingSpecificNamespaces.yaml
│ ├── prometheus-roleConfig.yaml
│ ├── prometheus-roleSpecificNamespaces.yaml
│ ├── prometheus-serviceAccount.yaml
│ └── prometheus-service.yaml
├── serviceMonitor
│ ├── alertmanager-serviceMonitor.yaml
│ ├── blackbox-exporter-serviceMonitor.yaml
│ ├── grafana-serviceMonitor.yaml
│ ├── kubernetes-serviceMonitorApiserver.yaml
│ ├── kubernetes-serviceMonitorCoreDNS.yaml
│ ├── kubernetes-serviceMonitorKubeControllerManager.yaml
│ ├── kubernetes-serviceMonitorKubelet.yaml
│ ├── kubernetes-serviceMonitorKubeScheduler.yaml
│ ├── kube-state-metrics-serviceMonitor.yaml
│ ├── node-exporter-serviceMonitor.yaml
│ ├── prometheus-adapter-serviceMonitor.yaml
│ ├── prometheus-operator-serviceMonitor.yaml
│ └── prometheus-serviceMonitor.yaml
└── setup
├── 0namespace-namespace.yaml
├── prometheus-operator-0alertmanagerConfigCustomResourceDefinition.yaml
├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
├── prometheus-operator-0probeCustomResourceDefinition.yaml
├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
├── prometheus-operator-clusterRoleBinding.yaml
├── prometheus-operator-clusterRole.yaml
├── prometheus-operator-deployment.yaml
├── prometheus-operator-serviceAccount.yaml
└── prometheus-operator-service.yaml
12 directories, 88 files
2.1、修改yaml,增加持久化存储
manifests/prometheus/prometheus-prometheus.yaml
...
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.26.0
# 新增持久化存储,yaml 末尾添加
retention: 3d
storage:
volumeClaimTemplate:
spec:
storageClassName: prometheus-nfs
resources:
requests:
storage: 5Gi
manifests/grafana/grafana-deployment.yaml
...
132 volumes:
133 # - emptyDir: {} 注释掉这两行
134 # name: grafana-storage
135 - name: grafana-storage # 添加该内容
136 persistentVolumeClaim:
137 claimName: grafana-pvc
138 - name: grafana-datasources # 添加上面的,这个不变
139 secret:
140 secretName: grafana-datasources
2.2、准备 持久化配置(nfs-storage)
需要提前安装nfs-server 服务端
manifests/other/nfs-storage/nfs-rbac.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-provisioner
namespace: monitoring
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-provisioner-runner
namespace: monitoring
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["services", "endpoints"]
verbs: ["get","create","list", "watch","update"]
- apiGroups: ["extensions"]
resources: ["podsecuritypolicies"]
resourceNames: ["nfs-provisioner"]
verbs: ["use"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-provisioner
subjects:
- kind: ServiceAccount
name: nfs-provisioner
namespace: monitoring
roleRef:
kind: ClusterRole
name: nfs-provisioner-runner
apiGroup: rbac.authorization.k8s.io
manifests/other/nfs-storage/nfs-provisioner.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: nfs-client-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccount: nfs-provisioner
containers:
- name: nfs-client-provisioner
#image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0
image: registry.cn-beijing.aliyuncs.com/mydlq/nfs-subdir-external-provisioner:v4.0.0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: nfs-client
- name: NFS_SERVER
value: 192.168.50.134
- name: NFS_PATH
value: /data/nfs/prometheus
volumes:
- name: nfs-client-root
nfs:
server: 192.168.50.134
path: /data/nfs/prometheus
manifests/other/nfs-storage/nfs-storageclass.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: prometheus-nfs
namespace: monitoring
annotations:
storageclass.kubernetes.io/is-default-class: "false" ## 是否设置为默认的storageclass
provisioner: nfs-client ## 动态卷分配者名称,必须和上面创建的"provisioner"变量中设置的Name一致
reclaimPolicy: Retain ## 指定回收政策,storage 默认为 Delete
这里需要创建grafana的 pvc 并指向我们创建的storage ,它会自动生成pv
manifests/other/nfs-storage/grafana-pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: grafana-pvc
namespace: monitoring
spec:
storageClassName: "prometheus-nfs"
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
三、启动服务
kubectl apply -f setup/
# 创建 nfs-storage 和grafana-pvc
kubectl apply -f other/nfs-storage/
# 启动服务
kubectl apply -f adapter/ -f alertmanager/ -f blackbox/ grafana/ -f kube-state-metrics/ -f node-exporter/ -f prometheus/ -f serviceMonitor/
启动完如下图:
3.1、安装 ingress-controller
准备 ingress yaml
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.1/deploy/static/provider/cloud/deploy.yaml -O ingress-deploy.yaml
manifests/other/ingress/ingress-deploy.yaml
修改 ingress yaml ,下面贴出需要修改的地方
---
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
---
# 添加 ingress的 日志持久化 pvc
# add ingress logs volume-pvc
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ingress-logs-pvc
namespace: ingress-nginx
spec:
# 指向我们创建的 nfs-storage
storageClassName: prometheus-nfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
---
...
---
# Source: ingress-nginx/templates/controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
helm.sh/chart: ingress-nginx-3.30.0
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.46.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
data:
# 设置configmap 定义日志输出目录
access-log-path: "/var/log/nginx/access.log"
error-log-path: "/var/log/nginx/error.log"
...
args:
- /nginx-ingress-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
- --v=2
# 添加下面2两行,日志输出路径
- --log_dir=/var/log/nginx/
- --logtostderr=false
...
volumeMounts:
- name: webhook-cert
mountPath: /usr/local/certificates/
readOnly: true
# 添加挂载目录
- name: ingress-logs
mountPath: /var/log/nginx/
resources:
requests:
cpu: 100m
memory: 90Mi
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: ingress-nginx
terminationGracePeriodSeconds: 300
volumes:
- name: webhook-cert
secret:
secretName: ingress-nginx-admission
#持久化日志
- name: ingress-logs
persistentVolumeClaim:
claimName: ingress-logs-pvc
...
3.2、编写ingress 代理规则
manifests/other/ingress/prom-ingress.yaml
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prom-ingress
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
prometheus.io/http_probe: "true"
spec:
rules:
- host: alert.k8s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
number: 9093
- host: grafana.k8s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
- host: prom.k8s.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
本地hosts 添加解析
cat >> /etc/hosts <<EOF
192.168.50.134 alert.k8s.com grafana.k8s.com prom.k8s.com
EOF
四、浏览器访问
4.1、promethues
http://prom.k8s.com
4.2、grafana 添加 kubernetes 监控模版
模版ID: 13105https://grafana.com/grafana/dashboards/13105
http://grafana.k8s.com
效果如下图:
到此 kubernetes 部署 prometheus 监控 完毕!
参考文档
kube-prometheus官方地址:https://github.com/prometheus-operator/kube-prometheus