具体victoriametrics的功能和使用,我这里就不介绍了,官方文档上很全面的。
这里说下我的拓扑和架构。
- prometheus的部署使用kube-prometheus的 operator方式部署。
- victoriametrics的部署使用sts方式部署。
- prometheus的数据通过remote_write方式写到victoriametrics里面,victoriametrics的压缩比较高,可以轻松存储数月的历史数据。
为什么没有采用全套的victoriametrics的方案?
- 现有的prometheus+alertmanager已经部署并对接到内部告警系统,不想再进行改造。
- victoriametrics在我们这是作为历史数据存储用,非核心的监控数据库。
- victoriametrics全套的技术栈组件也很多,引入太多,人力不足。
prometheus crd的修改
kubectl get Prometheus -n monitoring k8s -oyaml 下面是我修改后的配置:
注意是加了个remoteWrite的配置项,并且我把存储换成了nfs盘。
这里的存储方案大家根据自己需求来配就行。
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: annotations: generation: 27 labels: app: prometheus prometheus: k8s name: k8s namespace: monitoring resourceVersion: "3757019465" selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s uid: 8c7be613-1a60-11ea-a1d8-72c40774f54f spec: additionalScrapeConfigs: key: prometheus-additional.yaml name: additional-scrape-configs alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web remoteWrite: - url: http://victoriametrics.monitoring.svc.cluster.local:8428/api/v1/write replicas: 2 ruleSelector: matchLabels: prometheus: k8s role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 serviceAccountName: prometheus serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} storage: volumeClaimTemplate: spec: resources: requests: storage: 500Gi storageClassName: alicloud-nas-prometheus version: v2.25.0
victoriametrics的2个配置
victoriametrics.svc.yaml 内容如下:
apiVersion: v1 kind: Service metadata: annotations: labels: app: victoriametrics name: victoriametrics namespace: monitoring spec: ports: - name: http port: 8428 protocol: TCP targetPort: 8428 selector: app: victoriametrics sessionAffinity: None type: ClusterIP---apiVersion: v1 kind: Service metadata: annotations: creationTimestamp: null labels: app: victoriametrics name: victoriametrics-nodeport namespace: monitoring spec: externalTrafficPolicy: Cluster ports: - name: http port: 8428 protocol: TCP targetPort: 8428 selector: app: victoriametrics sessionAffinity: None type: NodePort
victoriametrics.sts.yaml 内容如下:
apiVersion: apps/v1 kind: StatefulSet metadata: annotations: creationTimestamp: null generation: 1 labels: app: victoriametrics name: victoriametrics spec: podManagementPolicy: OrderedReady replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: victoriametrics serviceName: victoriametrics template: metadata: creationTimestamp: null labels: app: victoriametrics spec: containers: - args: - --storageDataPath=/storage - --httpListenAddr=:8428 - --retentionPeriod=1 image: victoriametrics/victoria-metrics imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /health port: 8428 scheme: HTTP initialDelaySeconds: 120 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 30 name: victoriametrics ports: - containerPort: 8428 protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /health port: 8428 scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 30 resources: limits: cpu: "4" memory: 8000Mi requests: cpu: "4" memory: 8000Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /storage name: victormetrics-storage dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 updateStrategy: rollingUpdate: partition: 0 type: RollingUpdate volumeClaimTemplates: - metadata: creationTimestamp: null name: victormetrics-storage spec: accessModes: - ReadWriteOnce resources: requests: storage: 300Gi storageClassName: alicloud-nas-prometheus volumeMode: Filesystem
几个apply下,然后到grafana添加一个victoriametrics的数据源,然后可以画板子了。
磁盘占用空间对比:同样时间窗口内,victoriametrics的体积只有prometheus的25%。因为我们这里的victoriametrics只是一个历史数据存储+灾备的功能,对性能上要求不高,victoriametrics的sts里面的配置给得也不是很高,各位可以根据实际情况来修改cpu mem配额。