前天 haproxy和keepalived部署3个master节点高可用Kubernetes 集群 ,不小心将2个worker 节点也都作为 master 节点 join 了,后来直接 kubeadm reset 再想 join 的时候,已经超过 24 小时!
官方文档
-
生成新的 token
kubeadm token create [token]
列举所有的token
kubeadm token list [flags]
-
使用 OpenSSL CLI 生成CA 键哈希
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
他这里还列举了 2 个kubeadm join 命令示例
对于工作节点:
kubeadm join --discovery-token abcdef.1234567890abcdef --discovery-token-ca-cert-hash sha256:1234..cdef 1.2.3.4:6443
对于控制面节点:
kubeadm join --discovery-token abcdef.1234567890abcdef --discovery-token-ca-cert-hash sha256:1234..cdef --control-plane 1.2.3.4:6443
以下是我这里的操作记录
从 master1-141 节点删除出错的 2 个 worker 节点
应该不必delete ,甚至 reboot !这是我头一次遇到这种情况,所以,尽量简化环境
[root@master1-141 working]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master0-140 Ready control-plane,master 46h v1.22.2
master1-141 Ready control-plane,master 46h v1.22.2
master2-142 Ready control-plane,master 46h v1.22.2
node3-143 NotReady control-plane,master 46h v1.22.2
node4-144 NotReady control-plane,master 46h v1.22.2
[root@master1-141 working]# kubectl delete node node3-143
node "node3-143" deleted
[root@master1-141 working]# kubectl delete node node4-144
node "node4-144" deleted
[root@master1-141 working]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master0-140 Ready control-plane,master 46h v1.22.2
master1-141 Ready control-plane,master 46h v1.22.2
master2-142 Ready control-plane,master 46h v1.22.2
现在只剩下 3 个 master 节点了!
master节点重新生成 token 和 CA key
-
重新生成 token
[root@master1-141 working]# kubeadm token create
ce2dbj.qx3k5py7auj0hcit
[root@master1-141 working]# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS ce2dbj.qx3k5py7auj0hcit 23h 2021-11-14T06:17:23Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token ewodnv.g64hsqgs49pqkw0b 23h 2021-11-14T06:18:51Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
-
重新生成 CA key
[root@master1-141 working]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed ‘s/^.* //’
0b905eab6e0725c7716b7192320c2da5e08bc1b53ec8f95595e863dfe2db1eb5
重新制作 join 指令
-
这是从 log 查出来的之前 woker node 的 join 指令
kubeadm join 192.168.0.149:6444 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:0e3deff56a511ffa3fcb66ba0ecd378438ca6332aaa7fb187808118a5640f6f0
替换中间的 token 和 CA key 之后,woerker 节点 kube reset 之后
总是报错!
[root@node3-143 ~]# kubeadm join --discovery-token ewodnv.g64hsqgs49pqkw0b --discovery-token-ca-cert-hash 0b905eab6e0725c7716b7192320c2da5e08bc1b53ec8f95595e863dfe2db1eb5 192.168.0.149:6444 [preflight] Running pre-flight checks error execution phase preflight: couldn't validate the identity of the API Server: invalid discovery token CA certificate hash: invalid hash, expected "format:hex-value". Known format(s) are: sha256 To see the stack trace of this error execute with --v=5 or higher
加上 --v=5 查看错误如下
... /usr/local/go/src/runtime/asm_amd64.s:1371 cluster CA found in cluster-info ConfigMap is invalid
重新生成 token 和 CA key 还是不行,干脆 reboot!
重启之后再来,就 ok 了!
具体原因不明!可能和我之前错误的机盎然使用了 master 节点 join 指令有关! -
另一个 worker 节点就直接 reset 后 reboot ,直接就 join 成功
[root@node4-144 ~]# kubeadm join 192.168.0.149:6444 --token ce2dbj.qx3k5py7auj0hcit \ > --discovery-token-ca-cert-hash sha256:0b905eab6e0725c7716b7192320c2da5e08bc1b53ec8f95595e863dfe2db1eb5 > [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
-
回到 master 节点确认
[root@master1-141 working]# kubectl get nodes
NAME STATUS ROLES AGE VERSION master0-140 Ready control-plane,master 47h v1.22.2 master1-141 Ready control-plane,master 47h v1.22.2 master2-142 Ready control-plane,master 47h v1.22.2 node3-143 Ready <none> 24m v1.22.2 node4-144 Ready <none> 22m v1.22.2