新一代资源指标获取是使用metrics-server
组件,用于取老一代的heasper
,项目地址:https://github.com/kubernetes-sigs/metrics-server
自定义指标使用prometheus
来收集监控数据。
新一代架构:
核心指标流水线:由kubelet、metrics-server以及由API Server提供的api组成;主要收集cpu累积使用率、内存的实时使用率,以及Pod的资源占用率及容器的磁盘占用率
监控流水线:用于从系统收集各种指标数据并提供给用户、存储系统及HPA,包含核心指标及许多非核心指标,非核心指标不能被k8s所解析。
metrics-server
项目地址: https://github.com/kubernetes-sigs/metrics-server/tree/master
在kubernetes项目的代码树中也包含了相应版本的代码: kubernetes/cluster/addons/metrics-server/
目前最新版本为v0.3.7
,而kubernetes项目代码树中为v0.3.6
的版本,以v0.3.7
版本为例子说明安装过程。
k8s@node01:~/install_k8s/metrics-server$ wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml
# **注意**:需要对yaml文件做些修改,否则直接apply后,metrics-server相应的pod的log无报错信息,但资源指标无法收集
k8s@node01:~/install_k8s/metrics-server$ vim components.yaml
...
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
hostNetwork: true # 增加此行,使pod使用节点的网络名称空间
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7
imagePullPolicy: IfNotPresent
args:
- --cert-dir=/tmp
- --secure-port=4443
- --metric-resolution=30s # 增加此行
- --kubelet-insecure-tls # 增加此行
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname #增加此行
ports:
- name: main-port
containerPort: 4443
...
# 应用
k8s@node01:~/install_k8s/metrics-server$ kubectl apply -f components.yaml
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
# 在kube-system名称空间里会运行一个相应的容器
k8s@node01:~/install_k8s/dashboard$ kubectl get pods -n kube-system
...
metrics-server-67b9947bdc-f54xv 1/1 Running 0 12m
...
# api-versions中也会多一个组
k8s@node01:~/install_k8s/dashboard$ kubectl api-versions
...
metrics.k8s.io/v1beta1
# 启一个代理进行测试
k8s@node01:~/install_k8s/metrics-server$ kubectl proxy --port 8080
Starting to serve on 127.0.0.1:8080
# 节点指标信息查询
k8s@node01:~/install_k8s/dashboard$ curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/nodes
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
"items": [
{
"metadata": {
"name": "node01",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node01",
"creationTimestamp": "2020-08-04T14:14:15Z"
},
"timestamp": "2020-08-04T14:13:50Z",
"window": "30s",
"usage": {
"cpu": "227194873n",
"memory": "1067084Ki"
}
},
...
# pods指标信息查询
k8s@node01:~/install_k8s/dashboard$ curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods
{
"kind": "PodMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/pods"
},
"items": [
{
"metadata": {
"name": "kube-proxy-2ck9j",
"namespace": "kube-system",
"selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-proxy-2ck9j",
"creationTimestamp": "2020-08-04T14:15:18Z"
},
"timestamp": "2020-08-04T14:14:47Z",
"window": "30s",
"containers": [
{
"name": "kube-proxy",
"usage": {
"cpu": "346717n",
"memory": "17388Ki"
}
}
]
},
...
# top命令可以正常使用
k8s@node01:~/install_k8s/dashboard$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node01 244m 12% 1029Mi 55%
node02 80m 8% 425Mi 49%
node03 86m 8% 353Mi 40%
k8s@node01:~/install_k8s/dashboard$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
myapp 0m 2Mi
k8s@node01:~/install_k8s/dashboard$ kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
calico-kube-controllers-578894d4cd-k4ljg 2m 14Mi
canal-fcpmq 18m 60Mi
canal-jknl6 20m 57Mi
canal-xsg99 24m 68Mi
coredns-66bff467f8-9tfh6 3m 10Mi
coredns-66bff467f8-sxpb7 3m 10Mi
...
Prometheus
更多信息请参考:
https://github.com/kubernetes/kubernetes/tree/release-1.16/cluster/addons/prometheus
https://github.com/coreos/kube-prometheus/tree/master/manifests/setup
https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests
prometheus是一个监控系统,在各被监控节点上也会部署一个agent,这个agent叫node_exporter
,prometheus会通过node_exporter
来采集节点的监控指标数据,prometheus通过metrics url
采集各个节点上pod的指标数据,采集后数据进行持久化后使用PromSQL
查询接口进行数据查询;prometheus中存储的数据k8s API Server
不能解析,这时就需要使用第三方研发的k8s-prometheus-adapter
这个组件来把prometheus中的数据转换成k8s能解析的数据,k8s-prometheus-adapter
把一个叫做custom metrics api
的接口整合到k8s的API Server中,这样k8s就可以比较方便的调用prometheus中的指标数据。
在kubernetes 1.16
版本前的代码树中集成了prometheus的资源配置清单,在1.16后就移除了prometheus的资源清单,资源清单地址为:https://github.com/kubernetes/kubernetes/tree/release-1.16/cluster/addons/prometheus,其中prometheus服务是以statefulset的方式运行,并且数据的持久化需要使用pv和pvc,本实验环境不具备,所以需要对配置清单做些修改,最终使用“https://github.com/zhaochj/k8s-prometheus”中的配置清单,该项目对清单做了简单的分类,并对prometheus服务使用了deployment方式部署,并使用emtydir的存储卷,数据未做持久化,只能在测试环境使用。
k8s@node01:~/install_k8s$ git clone https://github.com/zhaochj/k8s-prometheus.git
k8s@node01:~/install_k8s$ cd k8s-prometheus/
k8s@node01:~/install_k8s/k8s-prometheus$ ls
grafana.yaml k8s-prometheus-adapter kube-state-metrics namespace.yaml node_exporter podinfo prometheus README.md
# 创建名称空间,prometheus相关服务都单独一个名称空间
k8s@node01:~/install_k8s/k8s-prometheus$ kubectl apply -f namespace.yaml
# 部署node_exporter
k8s@node01:~/install_k8s/k8s-prometheus$ cd node_exporter/
k8s@node01:~/install_k8s/k8s-prometheus/node_exporter$ kubectl apply -f ./
# 部署state-metrics
k8s@node01:~/install_k8s/k8s-prometheus/node_exporter$ cd ../kube-state-metrics/
k8s@node01:~/install_k8s/k8s-prometheus/kube-state-metrics$ kubectl apply -f ./
# 部署prometheus服务
k8s@node01:~/install_k8s/k8s-prometheus/prometheus$ pwd
/home/k8s/install_k8s/k8s-prometheus/prometheus
k8s@node01:~/install_k8s/k8s-prometheus/prometheus$ kubectl apply -f ./
# 部署自定义资源适配器,自定义资源需要与api server进行交互,所以需要一个证书,secret资源名称在custom-metrics-apiserver-deployment.yaml文件中,名称为cm-adapter-serving-certs
k8s@node01:/etc/kubernetes/pki$ (sudo umask 077; sudo openssl genrsa -out serving.key 2048)
k8s@node01:/etc/kubernetes/pki$ sudo openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
k8s@node01:/etc/kubernetes/pki$ sudo openssl x509 -req -in serving.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out serving.crt -days 3650
# 创建secret资源
k8s@node01:/etc/kubernetes/pki$ sudo kubectl create secret generic cm-adapter-serving-certs --from-file=serving.key=./serving.key --from-file=serving.crt=./serving.crt -n prom
secret/cm-adapter-serving-certs created
k8s@node01:/etc/kubernetes/pki$ kubectl get secret -n prom
NAME TYPE DATA AGE
cm-adapter-serving-certs Opaque 2 10s
default-token-5qjqt kubernetes.io/service-account-token 3 102m
kube-state-metrics-token-hzhs8 kubernetes.io/service-account-token 3 100m
k8s@node01:~/install_k8s/k8s-prometheus/node_exporter$ cd ../k8s-prometheus-adapter
k8s@node01:~/install_k8s/k8s-prometheus/k8s-prometheus-adapter$ kubectl apply -f ./
# 部署自定义资源适配资源
k8s@node01:~/install_k8s/k8s-prometheus/node_exporter$ cd ../k8s-prometheus-adapter/
k8s@node01:~/install_k8s/k8s-prometheus/k8s-prometheus-adapter$ kubectl apply -f ./
# 部署grafana
k8s@node01:~/install_k8s/k8s-prometheus$ pwd
/home/k8s/install_k8s/k8s-prometheus
k8s@node01:~/install_k8s/k8s-prometheus$ kubectl apply -f grafana.yaml
# 部署完成后可以看到api server中多出一个群组
k8s@node01:~/install_k8s$ kubectl api-versions
...
custom.metrics.k8s.io/v1beta1
...
# 使用kubectl proxy运行一个代理后就可以对该api进行访问获取许多指标数据
k8s@node01:~/install_k8s$ kubectl proxy --port 8080
Starting to serve on 127.0.0.1:8080
k8s@node01:~$ curl http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "persistentvolumeclaims/kube_persistentvolumeclaim_status_phase",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
...
# 部署完成后查看pod状态
k8s@node01:~/install_k8s/k8s-prometheus$ kubectl get pods -n prom
NAME READY STATUS RESTARTS AGE
custom-metrics-apiserver-7b45d8c74c-fw722 1/1 Running 0 6m29s
kube-state-metrics-86745c7b9-n4dhd 1/1 Running 0 108m
monitoring-grafana-54958d65c9-78kpz 1/1 Running 0 21s
prometheus-node-exporter-5qkgw 1/1 Running 0 109m
prometheus-node-exporter-d4npk 1/1 Running 0 109m
prometheus-node-exporter-hrbbh 1/1 Running 0 109m
prometheus-node-exporter-zz5vp 1/1 Running 0 109m
prometheus-server-66fc64694-9nqnm 1/1 Running 0 2m55s
# 查看svc服务
k8s@node01:~/install_k8s/k8s-prometheus$ kubectl get svc -n prom
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
custom-metrics-apiserver ClusterIP 10.101.213.142 <none> 443/TCP 7m7s
kube-state-metrics ClusterIP 10.100.132.67 <none> 8080/TCP 109m
monitoring-grafana NodePort 10.108.174.205 <none> 80:30591/TCP 59s
prometheus NodePort 10.108.76.168 <none> 9090:30090/TCP 3m34s
prometheus-node-exporter ClusterIP None <none> 9100/TCP 110m
访问prometheus界面
访问grafana界面
把数据源配置为prometheus类型,并配置相应的集群内的prometheus的访问地址
HPA
全称为: Horizontal Pod Autoscaler ,能实现pod的水平自动扩缩容。
k8s@node01:~/my_manifests/hpa$ cat deploy-resource-limit-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
resources:
requests:
cpu: "50m"
memory: "128Mi"
limits:
cpu: "50m"
memory: "128Mi"
---
apiVersion: v1
kind: Service
metadata:
name: myapp-svc
namespace: default
spec:
type: ClusterIP
selector:
app: myapp
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
k8s@node01:~/my_manifests/hpa$ kubectl apply -f deploy-resource-limit-demo.yaml
k8s@node01:~/my_manifests/hpa$ kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-deployment-58f8c68c56-wm6sw 1/1 Running 0 7m40s
k8s@node01:~/my_manifests/hpa$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16d
myapp-svc ClusterIP 10.100.109.34 <none> 80/TCP 2m2s
# 创建hpa资源
k8s@node01:~/my_manifests/hpa$ kubectl autoscale deployment myapp-deployment --min=1 --max=4 --cpu-percent=50
k8s@node01:~/my_manifests/hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapp-deployment Deployment/myapp-deployment 0%/50% 1 4 1 75s
在node02节点上进行压力测试,并观察hpa的资源变化
root@node02:~# ab -c 1000 -n 500000 http://10.100.109.34/index.html
...
# 观察hpa
k8s@node01:~/my_manifests/hpa$ kubectl describe hpa
Name: myapp-deployment
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Sat, 08 Aug 2020 13:36:37 +0800
Reference: Deployment/myapp-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 58% (29m) / 50% # 当前pod cpu使用率为58%
Min replicas: 1
Max replicas: 4
Deployment pods: 1 current / 2 desired # 计算后需要2个pod来支撑
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 6s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
# 自动扩展为2个pod,如果压力下降后过一段时间会自动回收
k8s@node01:~/my_manifests/hpa$ kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-deployment-58f8c68c56-7d8r5 1/1 Running 0 112s
myapp-deployment-58f8c68c56-wm6sw 1/1 Running 0 12m
使用kubectl autoscale
命令创建的hpa是v1版本的,只能以cpu为指标来进行pod的自动水平扩展,在新版本中api server提供了v2版本的api
k8s@node01:~$ kubectl api-versions
...
autoscaling/v1
autoscaling/v2beta1
autoscaling/v2beta2
...
v2版本的支持更多metrices来进行Pod的水平自动扩展,v2版本的必须以资产清单的方式进行创建
k8s@node01:~/my_manifests/hpa$ cat autoscale-v2.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa-v2
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp-deployment
minReplicas: 1 # 最小副本数
maxReplicas: 5 # 最大副本数
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
- type: Resource
resource:
name: memory # 支持memory为指标数据
targetAverageValue: 50Mi