说明
本文参考 https://segmentfault.com/a/1190000012755243。在前 文基础上整理、增加说明,避坑。
踩过的坑: 安装k8s 1.9.0 实践:问题集锦
环境说明
环境信息(采用一个master节点+两个node节点)
192.168.1.137 tensorflow0 node
192.168.1.138 tensorflow1 master
192.168.1.139 tensorflow2 node
操作系统版本:
[root@tensorflow1 ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
内核版本:
[root@tensorflow1 ~]# cat /proc/version
Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017
软件版本:
kubernetes v1.9.0
docker:17.03.2-ce
kubeadm:v1.9.0
kube-apiserver:v1.9.0
kube-controller-manager:v1.9.0
kube-scheduler:v1.9.0
k8s-dns-sidecar:1.14.7
k8s-dns-kube-dns:1.14.7
k8s-dns-dnsmasq-nanny:1.14.7
kube-proxy:v1.9.0
etcd:3.1.10
pause :3.0
flannel:v0.9.1
kubernetes-dashboard:v1.8.1
采用kubeadm安装
kubeadm为kubernetes官方推荐的自动化部署工具,他将kubernetes的组件以pod的形式部署在master和node节点上,并自动完成证书认证等操作。
因为kubeadm默认要从google的镜像仓库下载镜像,但目前国内无法访问google镜像仓库,所以这里我提交将镜像下好了,只需要将离线包的镜像导入到节点中就可以了。
开始安装
下载
链接: https://pan.baidu.com/s/1c2O1gIW 密码: 9s92
比对md5解压离线包
MD5 (k8s_images.tar.bz2) = b60ad6a638eda472b8ddcfa9006315ee
解压下载下来的离线包
tar -xjvf k8s_images.tar.bz2
所有节点操作
环境设置
绑定hosts
将节点ip和host写入hosts文件
[root@tensorflow1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.137 tensorflow0
192.168.1.138 tensorflow1
192.168.1.139 tensorflow2
关闭防火墙
systemctl stop firewalld && systemctl disable firewalld
关闭selinux
修改vi /etc/selinux/config 文件,将SELINUX改为disabled
SELINUX=disabled
setenforce 0
关闭swap
swapoff -a
设置永久关闭swap
修改/etc/fstab中内容,将swap那一行用#注释掉。
配置系统路由参数,防止kubeadm报路由警告
echo "
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
" >> /etc/sysctl.conf
sysctl -p
安装docker
安装docker-ce17.03 (kubeadmv1.9最大支持docker-ce17.03)
rpm -ihv docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch.rpm
rpm -ivh docker-ce-17.03.2.ce-1.el7.centos.x86_64.rpm
启动docker-ce
systemctl start docker && systemctl enable docker
检查docker服务
systemctl status docker
active (running)则正常
安装k8s
导入镜像
docker load <etcd-amd64_v3.1.10.tar
docker load <flannel\:v0.9.1-amd64.tar
docker load <k8s-dns-dnsmasq-nanny-amd64_v1.14.7.tar
docker load <k8s-dns-kube-dns-amd64_1.14.7.tar
docker load <k8s-dns-sidecar-amd64_1.14.7.tar
docker load <kube-apiserver-amd64_v1.9.0.tar
docker load <kube-controller-manager-amd64_v1.9.0.tar
docker load <kube-scheduler-amd64_v1.9.0.tar
docker load <kube-proxy-amd64_v1.9.0.tar
docker load <pause-amd64_3.0.tar
docker load <kubernetes-dashboard_v1.8.1.tar
注意kubernetes-dashboard_v1.8.1.tar与其他包不在同一个目录下,在上一级目录中
安装kubelet kubeadm kubectl包
rpm -ivh socat-1.7.3.2-2.el7.x86_64.rpm
rpm -ivh kubernetes-cni-0.6.0-0.x86_64.rpm kubelet-1.9.9-9.x86_64.rpm kubectl-1.9.0-0.x86_64.rpm
rpm -ivh kubectl-1.9.0-0.x86_64.rpm
rpm -ivh kubeadm-1.9.0-0.x86_64.rpm
修改kublet配置文件
查看docker cgroup driver:
docker info|grep Cgroup
有systemd和cgroupfs两种,把kubelet service配置改成与docker一致
vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
修改 Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"。修改 systemd为cgroupfs
启动kubelet
systemctl enable kubelet && sudo systemctl start kubelet
检查kubelet服务
systemctl status kubelet
kubelet启动后 ,不停重启、ca文件不存在是正常现象,在后续步骤 kubeadm init执行后会生成ca文件,就会正常运行。
The kubelet is now restarting every few seconds, as it waits in a crashloop for kubeadm to tell it what to do. This crashloop is expected and normal, please proceed with the next step and the kubelet will start running normally.
master节点操作
开始初始化master
kubeadm init --kubernetes-version=v1.9.0 --pod-network-cidr=10.244.0.0/16
kubernetes默认支持多重网络插件如flannel、weave、calico,这里使用flanne,就必须要设置--pod-network-cidr参数,10.244.0.0/16是kube-flannel.yml里面配置的默认网段,如果需要修改的话,需要把kubeadm init的--pod-network-cidr参数和后面的kube-flannel.yml里面修改成一样的网段就可以了。
将kubeadm join xxx保存下来,等下node节点加入集群需要使用
eg:
kubeadm join --token 5ce44e.47b6dc4e4b66980f 192.168.1.138:6443 --discovery-token-ca-cert-hash sha256:9d7eac82d66744405c783de5403e1f2bb7191b4c1b350d721b7b8570c62ff83a
如果忘记了,可以在master上通过kubeadmin token list得到
kubeadmin token list
默认token 24小时就会过期,后续的机器要加入集群需要重新生成token
kubeadm token create
sha256获取方式 master节点执行:
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
按照上面提示,此时root用户还不能使用kubelet控制集群需要,配置下环境变量
对于非root用户
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
对于root用户
可以直接放到~/.bash_profile
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source一下环境变量
source ~/.bash_profile
kubectl version测试
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T20:55:30Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
安装网络,可以使用flannel、calico、weave、macvlan这里我们用flannel。直接使用离线包里面的。
若要修改网段,需要kubeadm --pod-network-cidr=和这里同步
vi kube-flannel.yml
修改network项
"Network": "10.244.0.0/16",
执行
kubectl create -f kube-flannel.yml
node节点操作
使用刚刚kubeadm后的kubeadm join
kubeadm join --token 5ce44e.47b6dc4e4b66980f 192.168.1.138:6443 --discovery-token-ca-cert-hash sha256:9d7eac82d66744405c783de5403e1f2bb7191b4c1b350d721b7b8570c62ff83a
在master节点上确认一下
[root@tensorflow1 hadoop]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
tensorflow0 Ready <none> 1d v1.9.0
tensorflow1 Ready master 1d v1.9.0
tensorflow2 Ready <none> 1d v1.9.0
kubernetes会在每个node节点创建flannel和kube-proxy的pod
[root@tensorflow1 hadoop]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-tensorflow1 1/1 Running 0 1d
kube-system kube-apiserver-tensorflow1 1/1 Running 0 1d
kube-system kube-controller-manager-tensorflow1 1/1 Running 0 1d
kube-system kube-dns-6f4fd4bdf-59ttf 3/3 Running 0 1d
kube-system kube-flannel-ds-fb75p 1/1 Running 0 1d
kube-system kube-flannel-ds-ppm2t 1/1 Running 0 1d
kube-system kube-flannel-ds-w54wh 1/1 Running 0 1d
kube-system kube-proxy-4lftj 1/1 Running 0 1d
kube-system kube-proxy-cj4st 1/1 Running 0 1d
kube-system kube-proxy-kd7vb 1/1 Running 0 1d
kube-system kube-scheduler-tensorflow1 1/1 Running 0 1d
至此kubernetes基本集群安装完成。
--后续补充部署dashboard内容--
安装nvidia-gpu组件
这里是为了给容器使用gpu,需要安装组件,否则不用安装。
编辑/etc/docker/daemon.json
cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
重启docker
systemctl restart docker
编辑kubelet配置文件
资源则需要增加一行 Environment="KUBELET_EXTRA_ARGS=--feature-gates=DevicePlugins=true" 注意要加在 ExecStart= 之前
重启kubelet
systemctl daemon-reload && systemctl restart kubelet
根据gpu型号下载相应的gpu插件镜像,挂载镜像
docker load <
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/k8s-device-plugin 1.9 3325c3b04513 2 weeks ago 63 MB
通过yaml文件启动
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml
文件内容如下:
[root@tensorflow1 tf_gpu]# cat nvidia-device-plugin.yml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
template:
metadata:
# Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
# reserves resources for critical add-on pods so that they can be rescheduled after
# a failure. This annotation works in tandem with the toleration below.
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
# Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
# This, along with the annotation above marks this pod as a critical add-on.
- key: CriticalAddonsOnly
operator: Exists
containers:
- image: nvidia/k8s-device-plugin:1.9
name: nvidia-device-plugin-ctr
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
systemctl daemon-reload && systemctl restart kubelet
本文转自CSDN-离线安装k8s 1.9.0