文章目录
k8s容器资源限制
内存限制示例
如果容器超过其内存限制,则会被终止。如果可重新启动,则与所有其他类型的运行时故障一样,kubelet 将重新启动它。
如果一个容器超过其内存请求,那么当节点内存不足时,它的 Pod 可能被逐出。
vim memory.yaml
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
spec:
containers:
- name: memory-demo
image: stress
args:
- --vm
- "1"
- --vm-bytes
- 200M
resources:
requests:
memory: 50Mi
limits:
memory: 100Mi
运行这个yaml
[root@server2 limit]# kubectl apply -f memory.yaml
pod/memory-demo created
可以发现限制的最大内存为100M,然后需要的是200M内存,状态是OOMKilled
apiVersion: v1
kind: Pod
metadata:
name: cpu-demo
spec:
containers:
- name: cpu-demo
image: stress
resources:
limits:
cpu: "10"
requests:
cpu: "5"
args:
- -c
- "2"
应用yaml
kubectl apply -f cpu.yaml
可以看出cpu-demo一直处于Pending状态
kubernetes资源监控
Metrics-Server部署
Metrics-Server是集群核心监控数据的聚合器,用来替换之前的heapster。
容器相关的 Metrics 主要来自于 kubelet 内置的 cAdvisor 服务,有了Metrics-Server之后,用户就可以通过标准的 Kubernetes API 来访问到这些监控数据。
Metrics API 只可以查询当前的度量数据,并不保存历史数据。
Metrics API URI 为 /apis/metrics.k8s.io/,在 k8s.io/metrics 维护。
必须部署 metrics-server 才能使用该 API,metrics-server 通过调用 Kubelet Summary API 获取数据。
Metrics Server 并不是 kube-apiserver 的一部分,而是通过 Aggregator 这种插件机制,在独立部署的情况下同 kube-apiserver 一起统一对外服务的。
kube-aggregator 其实就是一个根据 URL 选择具体的 API 后端的代理服务器。
Metrics-server属于Core metrics(核心指标),提供API metrics.k8s.io,仅提供Node和Pod的CPU和内存使用情况
首先先下载Metrics-Server的资源清单
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
编辑下载的yaml文件
Metrics-server部署
[root@server2 metrics]# kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
[root@server2 metrics]# kubectl get pod -n kube-system | grep metrics
metrics-server-86d6b8bbcc-lrdfh 0/1 Running 0 79s
可以从上面看出,metrics-server-86d6b8bbcc-lrdfh虽然在running,但是没有ready
我们可以在部署后查看Metrics-server的Pod日志
kubectl -n kube-system logs metrics-server-86d6b8bbcc-lrdfh
原因:出现了x509的错误
Metric Server 支持一个参数 --kubelet-insecure-tls,可以跳过这一检查,然而官方也明确说了,这种方式不推荐生产使用。
我们使用 启用TLS Bootstrap 证书签发 来解决这个问题
在k8s所有集群主机中:
在k8s所有集群主机中:
在k8s所有集群主机中:
vim /var/lib/kubelet/config.yaml
serverTLSBootstrap: true #在最后一行加入
systemctl restart kubelet
kubectl get csr #查看证书签名请求
看到所有的csr的状况都是pending
[root@server2 metrics]# kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-59zz9 3s kubernetes.io/kubelet-serving system:node:server3 Pending
csr-8d2rt 3s kubernetes.io/kubelet-serving system:node:server4 Pending
csr-chz2d 7s kubernetes.io/kubelet-serving system:node:server2 Pending
kubectl certificate approve #证书批准
[root@server2 metrics]# kubectl certificate approve csr-59zz9 csr-8d2rt csr-chz2d
certificatesigningrequest.certificates.k8s.io/csr-59zz9 approved
certificatesigningrequest.certificates.k8s.io/csr-8d2rt approved
certificatesigningrequest.certificates.k8s.io/csr-chz2d approved
[root@server2 metrics]# kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-59zz9 118s kubernetes.io/kubelet-serving system:node:server3 Approved,Issued
csr-8d2rt 118s kubernetes.io/kubelet-serving system:node:server4 Approved,Issued
csr-chz2d 2m2s kubernetes.io/kubelet-serving system:node:server2 Approved,Issued
再次查看metrics的pod
kubectl get pod -n kube-system
可以看到已经ready
[root@server2 metrics]# kubectl get pod -n kube-system | grep metrics
metrics-server-86d6b8bbcc-lrdfh 1/1 Running 0 35m
部署成功后可以看到
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes/server2"
kubectl top node
[root@server2 metrics]# kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes/server2"
{"kind":"NodeMetrics","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"name":"server2","creationTimestamp":"2021-08-03T16:42:07Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"server2","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":"","node-role.kubernetes.io/master":"","node.kubernetes.io/exclude-from-external-load-balancers":""}},"timestamp":"2021-08-03T16:41:48Z","window":"10s","usage":{"cpu":"167373877n","memory":"1410456Ki"}}
[root@server2 metrics]# kubectl top node
W0804 00:42:13.983722 24949 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
server2 172m 8% 1378Mi 72%
server3 66m 6% 570Mi 64%
server4 79m 7% 599Mi 67%
补充:
错误1:dial tcp: lookup server2 on 10.96.0.10:53: no such host
这是因为没有内网的DNS服务器,所以metrics-server无法解析节点名字。可以直接修改coredns的configmap,讲各个节点的主机名加入到hosts中,这样所有Pod都可以从CoreDNS中解析各个节点的名字。
kubectl edit configmap coredns -n kube-system
apiVersion: v1
data:
Corefile: |
...
ready
hosts {
ip nodename
ip nodename
ip nodename
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
错误2:Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
如果metrics-server正常启动,没有错误,应该就是网络问题。修改metrics-server的Pod 网络模式:
hostNetwork: true
Dashboard部署
Dashboard可以给用户提供一个可视化的 Web 界面来查看当前集群的各种信息。用户可以用 Kubernetes Dashboard 部署容器化的应用、监控应用的状态、执行故障排查任务以及管理 Kubernetes 各种资源。
网址:https://github.com/kubernetes/dashboard
下载部署文件:https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml
提前在harbor仓库上传镜像
直接运用下载的部署文件,进行Dashboard部署
[root@server2 dashboard]# kubectl apply -f recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
[root@server2 dashboard]# kubectl -n kubernetes-dashboard get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.105.154.129 <none> 8000/TCP 6s
kubernetes-dashboard ClusterIP 10.102.109.205 <none> 443/TCP 7s
查看到创建出的svc的type为clusterIP修改为LoadBalancer方式,以便外部访问
[root@server2 dashboard]# kubectl -n kubernetes-dashboard edit svc kubernetes-dashboard
service/kubernetes-dashboard edited
可以看到已经分配到了一个外部IP 172.25.21.11
如果状态是pending 可以查看https://blog.csdn.net/Puuwuuchao/article/details/119172011#t5这篇文章的LoadBalancer部分
登陆dashboard需要认证,需要获取dashboard pod的token:
kubectl -n kubernetes-dashboard get secrets
kubectl -n kubernetes-dashboard describe secrets kubernetes-dashboard-token-k27nb
使用token进入后,会出现RBAC的问题
默认dashboard对集群没有操作权限,需要授权
vim rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kubernetes-dashboard
[root@server2 dashboard]# kubectl apply -f rbac.yaml
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard-admin created
可以看到已完成部署