1、脚本安装完etcd后启动失败 解决:所有节点重启即可解决, 2、kube-apiserver报错: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input 现象如下: [root@test1 ssl]# systemctl status kube-apiserver -l ● kube-apiserver.service - Kubernetes API Server Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2019-02-06 18:14:58 EST; 1h 3min ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 1684 (kube-apiserver) Tasks: 16 Memory: 11.4M CGroup: /system.slice/kube-apiserver.service └─1684 /opt/k8s/bin/kube-apiserver --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --anonymous-auth=false # --experimental-encryption-provider-config=/etc/kubernetes/encryption-config.yaml --advertise-address=192.168.0.91 --bind-address=192.168.0.91 --insecure-port=8080 --authorization-mode=Node,RBAC # --runtime-config=api/all --enable-bootstrap-token-auth --token-auth-file=/etc/kubernetes/token.csv --service-cluster-ip-range=10.254.0.0/16 --service-node-port-range=8000-30000 --tls-cert-file=/etc/kubernetes/cert/kubernetes.pem --tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem --client-ca-file=/etc/kubernetes/cert/ca.pem --kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem --kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem --etcd-cafile=/etc/kubernetes/cert/ca.pem --etcd-certfile=/etc/kubernetes/cert/kubernetes.pem --etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem --service-account-key-file=/etc/kubernetes/cert/sa.pub --etcd-servers=https://192.168.0.91:2379,https://192.168.0.92:2379,https://192.168.0.93:2379 --enable-swagger-ui=true --secure-port=6443 --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --allow-privileged=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/log/kube-apiserver-audit.log --event-ttl=1h --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.055401 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.650493 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.074728 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.666053 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.103077 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.689155 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.123484 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.707282 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.246831 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.729613 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input 解决:用脚本重新装了一遍好了。 3、kube-apiserver无法启动:external host was not specified, using 192.168.0.91 解决:kube-apiserver启动文件里面的注释都删掉即可解决 4、kubelet日志有错误:No valid private key and/or certificate found, reusing existing private key or creating a new one 下面报错是正常的,但是还是排查了一遍发现两个致命错误 [root@test4 kubernetes]# systemctl status kubelet -l ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; static; vendor preset: disabled) Active: active (running) since Thu 2019-02-07 07:24:53 EST; 5s ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 73646 (kubelet) Tasks: 12 Memory: 15.2M CGroup: /system.slice/kubelet.service └─73646 /opt/k8s/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig --cert-dir=/etc/kubernetes/cert --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet.config.json --hostname-override=test4 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest --allow-privileged=true --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.021451 73646 server.go:407] Version: v1.13.0 Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024450 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024837 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025195 73646 plugins.go:103] No cloud provider specified. Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025304 73646 server.go:523] No cloud provider specified: "" from the config file: "" Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025410 73646 bootstrap.go:65] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.043219 73646 bootstrap.go:96] No valid private key and/or certificate found, reusing existing private key or creating a new one Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.176716 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials Feb 07 07:24:56 test4 kubelet[73646]: I0207 07:24:56.347469 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials Feb 07 07:24:58 test4 kubelet[73646]: I0207 07:24:58.451741 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials 错误一: 查看生成boootstrap配置文件发现错误, 发现BOOTSTRAP_TOKEN=(kubeadm )竟然没有加$,必须要加上$符号。这是最主要的错误,还有个错误看下面 BOOTSTRAP_TOKEN=(kubeadm token create --description kubelet-bootstrap-token --groups system:bootstrappers:test1 --kubeconfig ~/.kube/config) [root@test1 profile]# cat bootstrap-kubeconfig.sh #!/bin/bash #定义变量 export MASTER_VIP="192.168.0.235" export KUBE_APISERVER="https://192.168.0.235:8443" export NODE_NAMES=(test1 test2 test3 test4) cd $HOME/ssl/ for node_name in ${NODE_NAMES[*]} do #创建 token export BOOTSTRAP_TOKEN=(kubeadm token create \ --description kubelet-bootstrap-token \ --groups system:bootstrappers:${node_name} \ --kubeconfig ~/.kube/config) #设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/cert/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #设置客户端认证参数 kubectl config set-credentials kubelet-bootstrap \ --token=${BOOTSTRAP_TOKEN} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #设置默认上下文 kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig done 错误二:查看参数配置文件发现一个错误 [root@test4 ~]# cat /etc/kubernetes/kubelet.config.json { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "address": "0.0.0.0", "port": 10250, "readOnlyPort": 0, "cgroupDriver": "cgroupfs", "hairpinMode": "promiscuous-bridge", "serializeImagePulls": false, "featureGates": { "RotateKubeletClientCertificate": true, "RotateKubeletServerCertificate": true }, "clusterDomain": "cluster.local.", "clusterDNS": ["10.254.0.2"] } 发现address: 0.0.0.0并不是真实的ip地址。在test4节点用hostname -i 看到的竟然是0.0.0.0,把address改成真实的worker节点ip即可 5、通过csr请求后发现没有node 解决:发现是kubelet停了;原因是往配置文件里面加上cavidor参数后重启了下,并没有看状态,之后才发现挂了,重启即可 6、kubectl无法查询pod资源:Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused 现象如下: [root@test4 profile]# kubectl run -it --rm --image=infoblox/dnstools dns-client kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. If you don't see a command prompt, try pressing enter. Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused deployment.apps "dns-client" deleted Error from server: Get https://test4:10250/containerLogs/default/dns-client-86c6d59f7-tzh5c/dns-client: dial tcp 0.0.0.0:10250: connect: connection refused 查看个资源也报错: [root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-p2vbw sh Error from server: error dialing backend: dial tcp 192.168.0.93:10250: connect: no route to host 开始排查: 查看coredns.yaml 文件 [root@test4 profile]# cat coredns.yaml apiVersion: v1 kind: ServiceAccount metadata: name: coredns namespace: kube-system --- apiVersion: v1 kind: Service metadata: name: kube-dns namespace: kube-system annotations: prometheus.io/port: "9153" prometheus.io/scrape: "true" labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" kubernetes.io/name: "CoreDNS" spec: selector: k8s-app: kube-dns clusterIP: cluster_dns_svc_ip ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP 发现没有ip "address": "0.0.0.0", [root@test4 profile]# cat /etc/kubernetes/kubelet.config.json { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "address": "0.0.0.0", "port": 10250, "readOnlyPort": 0, "cgroupDriver": "cgroupfs", "hairpinMode": "promiscuous-bridge", "serializeImagePulls": false, "featureGates": { "RotateKubeletClientCertificate": true, "RotateKubeletServerCertificate": true }, "clusterDomain": "cluster.local.", "clusterDNS": ["10.254.0.2"] } 解决上面的问题后,让然不管用。怀疑是apiserver的问题,最后参照这篇文档中的apiserver启动配置文件 https://www.cnblogs.com/effortsing/p/10312081.html 需要在所有master节点kube-apiserver 启动参数中添加这句话:--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname 然后重启所有master节点 kube-apiserver,就不再报dial tcp 192.168.0.93:10250: connect: no route to host,这个错误,但是出现新的报错,报错如下: 执行查看资源报错: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) [root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) 分析:这是因为user=kubernetes 这个用户没有RBAC权限,需要给kubernetes用户授权 解决:创建apiserver到kubelet的权限,就是没有给kubernetes用户rbac授权,授权即可,进行如下操作: 注意:user=kubernetes ,这个user要替换掉下面yaml文件里面的用户名 cat > apiserver-to-kubelet.yaml <<EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kubernetes-to-kubelet rules: - apiGroups: - "" resources: - nodes/proxy - nodes/stats - nodes/log - nodes/spec - nodes/metrics verbs: - "*" --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:kubernetes namespace: "" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:kubernetes-to-kubelet subjects: - apiGroup: rbac.authorization.k8s.io kind: User name: kubernetes EOF 创建授权: kubectl create -f apiserver-to-kubelet.yaml [root@test4 ~]# kubectl create -f apiserver-to-kubelet.yaml clusterrole.rbac.authorization.k8s.io/system:kubernetes-to-kubelet created clusterrolebinding.rbac.authorization.k8s.io/system:kubernetes created 重新进到容器查看资源 [root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh / # exit 现在可以进到容器里面查看资源了 参照文档:https://www.jianshu.com/p/b3d8e8b8fd7e 7、无法创建flannel、coredns 问题: Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system 现象如下:pod都挂掉状态 [root@test4 profile]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-mdskk 0/1 ContainerCreating 0 4s coredns-69d58bd968-xjqpj 0/1 ContainerCreating 0 3m6s kube-flannel-ds-4bgqb 0/1 Init:0/1 0 94s 查看pod日志发现错误: Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system [root@test4 profile]# kubectl describe pod coredns-69d58bd968-f9tn4 --namespace kube-system Name: coredns-69d58bd968-f9tn4 Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: test4/192.168.0.94 Start Time: Fri, 08 Feb 2019 23:50:28 -0500 Labels: k8s-app=kube-dns pod-template-hash=69d58bd968 Annotations: <none> Status: Pending IP: Controlled By: ReplicaSet/coredns-69d58bd968 Containers: coredns: Container ID: Image: coredns/coredns:1.2.0 Image ID: Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-29dbl (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-29dbl: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-29dbl Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 16m default-scheduler Successfully assigned kube-system/coredns-69d58bd968-f9tn4 to test4 Warning FailedMount 68s (x7 over 14m) kubelet, test4 Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system(38cb8d7e-2c26-11e9-8db2-000c2935f634)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"coredns-69d58bd968-f9tn4". list of unmounted volumes=[coredns-token-29dbl]. list of unattached volumes=[config-volume coredns-token-29dbl] Warning FailedMount 7s (x16 over 16m) kubelet, test4 MountVolume.SetUp failed for volume "coredns-token-29dbl" : couldn't propagate object cache: timed out waiting for the condition 查看docker日志报错是一样的: Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist" [root@test4 profile]# systemctl status docker -l ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2019-02-08 23:23:56 EST; 50min ago Docs: https://docs.docker.com Main PID: 956 (dockerd) CGroup: /system.slice/docker.service ├─ 956 /usr/bin/dockerd └─1152 docker-containerd --config /var/run/docker/containerd/containerd.toml Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.245990170-05:00" level=error msg="Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.248503580-05:00" level=error msg="Failed to load container mount f4e32003f4c0fc39d292b2dd76dd0a0016a0b1e72028c7d4910749fc7836efde: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.250961209-05:00" level=error msg="Failed to load container mount fb5ca71237d38e0bb413ac95a858ee3e41c209a936a1f41081bf2b6a57f10a45: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.253042348-05:00" level=error msg="Failed to load container mount fb8dfb7d9813b638ac24dc9b0cde97ed095c222b22f8d44f082f5130e2f233e4: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.666363859-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.760913207-05:00" level=info msg="Loading containers: done." Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.864408002-05:00" level=info msg="Docker daemon" commit=0520e24 graphdriver(s)=overlay2 version=18.03.0-ce Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.867069598-05:00" level=info msg="Daemon has completed initialization" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.883546083-05:00" level=info msg="API listen on /var/run/docker.sock" Feb 08 23:23:56 test4 systemd[1]: Started Docker Application Container Engine. 解决:重启docker即可 systemctl restart docker 再次查看pod马上就正常 [root@test4 profile]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-mdskk 1/1 Running 0 3m26s coredns-69d58bd968-xjqpj 1/1 Running 0 6m28s kube-flannel-ds-4bgqb 1/1 Running 0 4m56s 8、测试coredns功能时候,执行kubectl run -it --rm --image=infoblox/dnstools dns-client卡住 现象如下: [root@test4 ~]# kubectl run -it --rm --image=infoblox/dnstools dns-client kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. 原因:可能是因为flannal和coredns有问题,后来查看docker日志发现有错误日志;也可能是cpu标的太高,当时cpu86%。大概就是这两种情况。 解决: 关掉一个master节点来降低cpu 重启docker