ubuntu-20.04.3 使用kubeadm方式安装kubernetes(k8s)集群
1、初始化虚拟机环境
- 使用VM VirtualBox安装
ubuntu-20.04.3-live-server-amd64.iso
镜像,创建3台虚拟机,分别为- abcMaster:192.168.0.100
- abcNode1:192.168.0.115
- abcNode2:192.168.0.135
- 统一修改root用户密码
sudo passwd root
- 修改sshd服务配置,允许xshell以root用户访问
# 使用root身份修改/etc/ssh/sshd_config文件,设置PermitRootLogin为yes
# 重启sshd服务
service ssh restart
- 关闭防火墙、虚拟交换分区、selinux
# 关闭防火墙
ufw disable
# 关闭虚拟交换(注释fstab中swap配置)
vim /etc/fstab
# 关闭selinux(未找到)
- 设置主机hosts
cat >> /etc/hosts << EOF
192.168.0.100 abcmaster
192.168.0.115 abcnode1
192.168.0.135 abcnode2
EOF
- 将桥接的IPV4流量传递到iptables链中
# 配置
cat >> /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
# 生效
sysctl --system
2、安装docker
- 设置apt-get 国内源
# 备份/etc/apt/sources.list
cp /etc/apt/sources.list /etc/apt/sources.list.bak
# 使用vim 替换/etc/apt/sources.list中的资源地址
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
## 更新
apt-get update
- 安装docker
apt-get install -y docker.io
- 设置docker国内镜像源
# 配置
tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://wi175f8g.mirror.aliyuncs.com"]
}
EOF
# 重启
systemctl daemon-reload
systemctl restart docker
3、安装k8s
- 配置k8s安装源,然后安装kubelet、kubeadm、kubectl
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat >> /etc/apt/sources.list.d/kubernetes.list << EOF
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
- 在master节点执行kubeadm初始化命令
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=192.168.0.100 --image-repository registry.aliyuncs.com/google_containers
报错情况1:
[ERROR ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.4: output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/coredns:v1.8.4 not found: manifest unknown: manifest unknown
, error: exit status 1
原因:无法从aliyuncs仓库中拉取到coredns镜像。
解决方案: 直接使用docker拉取coredns/coredns镜像,然后tag为所需镜像
docker pull coredns/coredns
docker tag coredns/coredns:latest registry.aliyuncs.com/google_containers/coredns:v1.8.4
docker rmi coredns/coredns
报错情况2:kubelet.service服务启动失败
Jun 8 09:45:35 kubelet: F0608 09:45:35.392302 24268 server.go:266] failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
Jun 8 09:45:35 systemd: kubelet.service: main process exited, code=exited, status=255/n/a
Jun 8 09:45:35 systemd: Unit kubelet.service entered failed state.
Jun 8 09:45:35 systemd: kubelet.service failed.
解决方案:
修改docker服务的启动方式配置
# vim /usr/lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd
# 重启docker 服务
systemctl daemon-reload && systemctl restart docker
报错情况3:
[ERROR Port-6443]: Port 6443 is in use
.....
解决方案: 重置kubeadm
kubeadm reset
重新执行初始化命令,完成初始化
Your Kubernetes control-plane has initialized successfully!
按照提示执行后续命令
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf
查看部署状态
kubectl get nodes
# 执行结果
NAME STATUS ROLES AGE VERSION
abcmaster NotReady control-plane,master 3m59s v1.22.1
- 在abcnode1/2执行
kubeadm join
注意:执行时同样遇到了报错情况2
,所以需要在abcnode中也执行问题处理操作,然后执行kubeadm join命令
kubeadm join 192.168.0.100:6443 --token 7ni2ey.qkjhtp3ygsn0lswk \
--discovery-token-ca-cert-hash sha256:2ed9136ae664f9c74083f174e748be747c7e2926bdcf05877da003bd44f7fcc1
执行成功效果:
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
初始化结果:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
abcmaster NotReady control-plane,master 20m v1.22.1
abcnode1 NotReady <none> 4m6s v1.22.1
abcnode2 NotReady <none> 13s v1.22.1
目前所有节点都处于NotReady状态。
- 在abcmaster安装kube-flannel.yml插件
# 直接拉取并安装
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
操作失败,无法连接到对象地址,改为直接从gitHub下载kube-flannel.yml文件文件
# 1、下载kube-flannel.yml文件,并复制到abcmaster中
# 2、通过命令安装
kubectl apply -f ./kube-flannel.yml
查看pods状态
# kubectl get pods -n kube-system
root@abcmaster:~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7f6cbbb7b8-lx9m7 0/1 ContainerCreating 0 35m
coredns-7f6cbbb7b8-r6ctb 0/1 ContainerCreating 0 35m
etcd-abcmaster 1/1 Running 1 35m
kube-apiserver-abcmaster 1/1 Running 1 35m
kube-controller-manager-abcmaster 1/1 Running 1 35m
kube-flannel-ds-amd64-m5w5w 0/1 CrashLoopBackOff 4 (79s ago) 3m39s
kube-flannel-ds-amd64-rmvj4 0/1 CrashLoopBackOff 5 (13s ago) 3m39s
kube-flannel-ds-amd64-wjw74 0/1 CrashLoopBackOff 4 (82s ago) 3m39s
kube-proxy-djxs6 1/1 Running 1 19m
kube-proxy-q9c8h 1/1 Running 0 15m
kube-proxy-s7cfq 1/1 Running 0 35m
kube-scheduler-abcmaster 1/1 Running 1 35m
遇到kube-flannel-ds-amd64相关pod无法启动问题。查看Pod日志,看到有如下错误信息:
# kubectl logs kube-flannel-ds-amd64-m5w5w -n kube-system
ERROR: Job failed (system failure): pods is forbidden: User "system:serviceaccount:dev:default" cannot create resource "pods" in API group "" in the namespace "dev"
解决方案: 执行一下命令:
kubectl create clusterrolebinding gitlab-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts --namespace=dev
重新查看pods状态:
root@abcmaster:~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7f6cbbb7b8-lx9m7 0/1 ImagePullBackOff 0 41m
coredns-7f6cbbb7b8-r6ctb 0/1 ErrImagePull 0 41m
etcd-abcmaster 1/1 Running 1 42m
kube-apiserver-abcmaster 1/1 Running 1 42m
kube-controller-manager-abcmaster 1/1 Running 1 42m
kube-flannel-ds-amd64-75hbh 1/1 Running 0 4s
kube-flannel-ds-amd64-m5w5w 1/1 Running 6 (6m19s ago) 10m
kube-flannel-ds-amd64-wjw74 1/1 Running 6 (6m21s ago) 10m
kube-proxy-djxs6 1/1 Running 1 26m
kube-proxy-q9c8h 1/1 Running 0 22m
kube-proxy-s7cfq 1/1 Running 0 41m
kube-scheduler-abcmaster 1/1 Running 1 42m
仍然有coredns相关pod无法运行。
查询pod信息
root@abcmaster:~# kubectl get po coredns-7f6cbbb7b8-n9hnr -n kube-system -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2021-09-15T13:52:13Z"
generateName: coredns-7f6cbbb7b8-
labels:
k8s-app: kube-dns
pod-template-hash: 7f6cbbb7b8
name: coredns-7f6cbbb7b8-n9hnr
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: coredns-7f6cbbb7b8
uid: a66cd6e2-629a-4250-9732-01cf6331acb9
resourceVersion: "6860"
uid: 40b52d81-54c2-4882-87c4-a08ea5c66814
spec:
containers:
- args:
- -conf
- /etc/coredns/Corefile
image: registry.aliyuncs.com/google_containers/coredns:v1.8.4
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: coredns
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: 8181
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/coredns
name: config-volume
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-2kjjw
readOnly: true
dnsPolicy: Default
enableServiceLinks: true
nodeName: abcnode2
nodeSelector:
kubernetes.io/os: linux
preemptionPolicy: PreemptLowerPriority
priority: 2000000000
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: coredns
serviceAccountName: coredns
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
name: coredns
name: config-volume
- name: kube-api-access-2kjjw
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-09-15T13:52:13Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-09-15T13:52:13Z"
message: 'containers with unready status: [coredns]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-09-15T13:52:13Z"
message: 'containers with unready status: [coredns]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-09-15T13:52:13Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: registry.aliyuncs.com/google_containers/coredns:v1.8.4
imageID: ""
lastState: {}
name: coredns
ready: false
restartCount: 0
started: false
state:
waiting:
message: 'rpc error: code = Unknown desc = Error response from daemon: manifest
for registry.aliyuncs.com/google_containers/coredns:v1.8.4 not found: manifest
unknown: manifest unknown'
reason: ErrImagePull
hostIP: 192.168.0.135
phase: Pending
podIP: 10.244.2.3
podIPs:
- ip: 10.244.2.3
qosClass: Burstable
startTime: "2021-09-15T13:52:13Z"
通过以上信息可知是因为image无法拉取,但在abcmaster中该镜像是存在的。继续分析以上信息,发现该pod部署位置非abcmaster,而是192.168.0.135
既abcnode2,原因是abcnode1/2中也需要手工拉取coredns镜像。
在abcnode1和abcnode2上执行一下命令拉取并tag镜像
docker pull coredns/coredns
docker tag coredns/coredns:latest registry.aliyuncs.com/google_containers/coredns:v1.8.4
重新在master查看pods状态,全部运行,至此,k8s环境部署初步成功。
root@abcmaster:~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7f6cbbb7b8-n9hnr 1/1 Running 0 9m31s
coredns-7f6cbbb7b8-nc46c 1/1 Running 0 20m
etcd-abcmaster 1/1 Running 1 80m
kube-apiserver-abcmaster 1/1 Running 1 80m
kube-controller-manager-abcmaster 1/1 Running 1 80m
kube-flannel-ds-amd64-75hbh 1/1 Running 0 38m
kube-flannel-ds-amd64-m5w5w 1/1 Running 6 (44m ago) 48m
kube-flannel-ds-amd64-wjw74 1/1 Running 6 (44m ago) 48m
kube-proxy-djxs6 1/1 Running 1 64m
kube-proxy-q9c8h 1/1 Running 0 60m
kube-proxy-s7cfq 1/1 Running 0 80m
kube-scheduler-abcmaster 1/1 Running 1 80m
root@abcmaster:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
abcmaster Ready control-plane,master 81m v1.22.1
abcnode1 Ready <none> 65m v1.22.1
abcnode2 Ready <none> 61m v1.22.1