k8s故障记录

一、etcd故障修改

  1、利用已存活的etcd做个备份

ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.112.110:2379" snapshot save snapshot.db

  2、从集群中剔除有问题的节点

ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.112.110:2379,https://192.168.112.111:2379,https://192.168.112.112:2379" member list

  3、删除掉坏的节点

ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.112.110:2379,https://192.168.112.111:2379,https://192.168.112.112:2379" member remove 6627a32423113ab8

  4、修改坏掉节点的配置文件

-initial-cluster-state 由"new" 改成"existing"

  5、在启动新节点之前,必须把新节点接入集群

ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.112.110:2379,https://192.168.112.111:2379,https://192.168.112.112:2379"  member add etcd-3 --peer-urls=https://192.168.112.112:2380

  6、启动节点

systemctl start etcd

  7、再次检查集群状态

ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.112.110:2379,https://192.168.112.111:2379,https://192.168.112.112:2379" endpoint health
https://192.168.112.111:2379 is healthy: successfully committed proposal: took = 17.633542ms
https://192.168.112.112:2379 is healthy: successfully committed proposal: took = 18.544015ms
https://192.168.112.110:2379 is healthy: successfully committed proposal: took = 19.061029ms

二、harbor私有仓库引起rancher集群报错

  1、在master节点上起了一个harbor仓库,rancher访问不了

 

上一篇:阿里巴巴云原生 etcd 服务集群管控优化实践


下一篇:Kubernetes容器集群管理环境 - 完整部署(上篇)