在运行一段时间的集群中加入新的节点(k8s-node)

前言:新部署的 k8s 集群添加 node 节点,只需要 kubeadm join 即可,如果一个集群运行一段时间后,再需要添加 node ,由于 token 和 sha256 编码没有记录,需要重新查看

1 查看现有集群 node 信息

root@gz-gpu101:~# kubectl get node
NAME                 STATUS   ROLES    AGE   VERSION
gz-cpu031   Ready    node     24d   v1.14.1-2
gz-cpu032   Ready    node     24d   v1.14.1-2
gz-cpu033   Ready    node     24d   v1.14.1-2
gz-gpu101   Ready    master   24d   v1.14.1-2
root@gz-gpu101:~#

2 查看 token

默认 token 的有效期为 24 小时,当过期之后,该 token 就不可用了,在 master 节点上执行 kubeadm token create 重新创建 token 即可

而,我们这个集群居然是永久有效,那就省略 create 的步骤了

root@gz-gpu101:~# kubeadm token list
TOKEN                     TTL         EXPIRES   USAGES                   DESCRIPTION   EXTRA GROUPS
hwfep2.iw7q7ltqdwxbati5   <forever>   <never>   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
root@gz-gpu101:~#

3 获取ca证书sha256编码hash值

root@gz-gpu101:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | awk '{print $2}'
72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
root@gz-gpu101:~#

4 新节点加入集群

各需要加入到 k8s master 集群中的 node 节点都要初始化(关闭防火墙、swap等),并安装 docker、kubeadm、kubelet,并启动 kubelet 服务

具体请查看,kubeadm 安装部署 k8s

在新的 node(gz-gpu082) 上执行 kubeadm init 添加命令

kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060

5 再次查看

root@gz-gpu101:~# kubectl get node
NAME                 STATUS   ROLES    AGE     VERSION
gz-cpu031   Ready    node     24d     v1.14.1-2
gz-cpu032   Ready    node     24d     v1.14.1-2
gz-cpu033   Ready    node     24d     v1.14.1-2
gz-gpu101   Ready    master   24d     v1.14.1-2
qa-gpu082   Ready    <none>   2m18s   v1.14.1
root@gz-gpu101:~#

6 其他

6.1 修改 ROLES

新加入的 node ROLES 显示 ,观感不佳,修改一下,以下操作都是在 k8s-master 上执行

# 查看标签
root@gz-gpu101:~# kubectl get node --show-labels|grep role

# 给新 node 添加标签(此操作需要根据上个命令的结果做参考设置)
root@gz-gpu101:~# kubectl label node qa-gpu082 kubernetes.io/role=node
node/qa-gpu082 labeled

# 查看 node 信息
root@gz-gpu101:~# kubectl get node
NAME                 STATUS   ROLES    AGE   VERSION
gz-cpu031   Ready    node     24d   v1.14.1-2
gz-cpu032   Ready    node     24d   v1.14.1-2
gz-cpu033   Ready    node     24d   v1.14.1-2
gz-gpu101   Ready    master   24d   v1.14.1-2
qa-gpu082   Ready    node     14m   v1.14.1
root@gz-gpu101:~# 
6.2 错误处理

错误信息

root@qa-gpu082:~# kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
	[ERROR FileAvailable--etc-kubernetes-bootstrap-kubelet.conf]: /etc/kubernetes/bootstrap-kubelet.conf already exists
	[ERROR Port-10250]: Port 10250 is in use
	[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
root@qa-gpu082:~#

这台机器以前可能被用过,所以需要 reset 一下

解决办法

root@qa-gpu082:~# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0120 10:00:56.461284  684201 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

root@qa-gpu082:~#


# 再次执行 join
root@qa-gpu082:~# kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
	[WARNING RequiredIPVSKernelModulesAvailable]:

The IPVS proxier may not be used because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs]
or no builtin kernel IPVS support was found: map[ip_vs:{} ip_vs_rr:{} ip_vs_sh:{} ip_vs_wrr:{} nf_conntrack:{}].
However, these modules may be loaded automatically by kube-proxy if they are available on your system.
To verify IPVS support:

   Run "lsmod | grep 'ip_vs|nf_conntrack'" and verify each of the above modules are listed.

If they are not listed, you can use the following methods to load them:

1. For each missing module run 'modprobe $modulename' (e.g., 'modprobe ip_vs', 'modprobe ip_vs_rr', ...)
2. If 'modprobe $modulename' returns an error, you will need to install the missing module support for your kernel.

[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

root@qa-gpu082:~#

新部署的 k8s 集群添加 node 节点,只需要 kubeadm join 即可,如果一个集群运行一段时间后,再需要添加 node ,由于 token 和 sha256 编码没有记录,需要重新查看

1 查看现有集群 node 信息

root@gz-gpu101:~# kubectl get node
NAME                 STATUS   ROLES    AGE   VERSION
gz-cpu031   Ready    node     24d   v1.14.1-2
gz-cpu032   Ready    node     24d   v1.14.1-2
gz-cpu033   Ready    node     24d   v1.14.1-2
gz-gpu101   Ready    master   24d   v1.14.1-2
root@gz-gpu101:~#

2 查看 token

默认 token 的有效期为 24 小时,当过期之后,该 token 就不可用了,在 master 节点上执行 kubeadm token create 重新创建 token 即可

而,我们这个集群居然是永久有效,那就省略 create 的步骤了

root@gz-gpu101:~# kubeadm token list
TOKEN                     TTL         EXPIRES   USAGES                   DESCRIPTION   EXTRA GROUPS
hwfep2.iw7q7ltqdwxbati5   <forever>   <never>   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
root@gz-gpu101:~#

3 获取ca证书sha256编码hash值

root@gz-gpu101:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | awk '{print $2}'
72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
root@gz-gpu101:~#

4 新节点加入集群

各需要加入到 k8s master 集群中的 node 节点都要初始化(关闭防火墙、swap等),并安装 docker、kubeadm、kubelet,并启动 kubelet 服务

具体请查看,kubeadm 安装部署 k8s

在新的 node(gz-gpu082) 上执行 kubeadm init 添加命令

kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060

5 再次查看

root@gz-gpu101:~# kubectl get node
NAME                 STATUS   ROLES    AGE     VERSION
gz-cpu031   Ready    node     24d     v1.14.1-2
gz-cpu032   Ready    node     24d     v1.14.1-2
gz-cpu033   Ready    node     24d     v1.14.1-2
gz-gpu101   Ready    master   24d     v1.14.1-2
qa-gpu082   Ready    <none>   2m18s   v1.14.1
root@gz-gpu101:~#

6 其他

6.1 修改 ROLES

新加入的 node ROLES 显示 ,观感不佳,修改一下,以下操作都是在 k8s-master 上执行

# 查看标签
root@gz-gpu101:~# kubectl get node --show-labels|grep role

# 给新 node 添加标签(此操作需要根据上个命令的结果做参考设置)
root@gz-gpu101:~# kubectl label node qa-gpu082 kubernetes.io/role=node
node/qa-gpu082 labeled

# 查看 node 信息
root@gz-gpu101:~# kubectl get node
NAME                 STATUS   ROLES    AGE   VERSION
gz-cpu031   Ready    node     24d   v1.14.1-2
gz-cpu032   Ready    node     24d   v1.14.1-2
gz-cpu033   Ready    node     24d   v1.14.1-2
gz-gpu101   Ready    master   24d   v1.14.1-2
qa-gpu082   Ready    node     14m   v1.14.1
root@gz-gpu101:~# 
6.2 错误处理

错误信息

root@qa-gpu082:~# kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
	[ERROR FileAvailable--etc-kubernetes-bootstrap-kubelet.conf]: /etc/kubernetes/bootstrap-kubelet.conf already exists
	[ERROR Port-10250]: Port 10250 is in use
	[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
root@qa-gpu082:~#

这台机器以前可能被用过,所以需要 reset 一下

解决办法

root@qa-gpu082:~# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0120 10:00:56.461284  684201 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

root@qa-gpu082:~#


# 再次执行 join
root@qa-gpu082:~# kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
	[WARNING RequiredIPVSKernelModulesAvailable]:

The IPVS proxier may not be used because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs]
or no builtin kernel IPVS support was found: map[ip_vs:{} ip_vs_rr:{} ip_vs_sh:{} ip_vs_wrr:{} nf_conntrack:{}].
However, these modules may be loaded automatically by kube-proxy if they are available on your system.
To verify IPVS support:

   Run "lsmod | grep 'ip_vs|nf_conntrack'" and verify each of the above modules are listed.

If they are not listed, you can use the following methods to load them:

1. For each missing module run 'modprobe $modulename' (e.g., 'modprobe ip_vs', 'modprobe ip_vs_rr', ...)
2. If 'modprobe $modulename' returns an error, you will need to install the missing module support for your kernel.

[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

root@qa-gpu082:~#
上一篇:二叉树的公共祖先(剑指offer 68 - II)Java深度优先遍历


下一篇:服务网格和Istio初识-续