离线安装k8s 1.9.0

说明
本文参考 https://segmentfault.com/a/1190000012755243。在前 文基础上整理、增加说明,避坑。

踩过的坑: 安装k8s 1.9.0 实践:问题集锦

环境说明

环境信息(采用一个master节点+两个node节点)
192.168.1.137 tensorflow0 node
192.168.1.138 tensorflow1 master
192.168.1.139 tensorflow2 node

操作系统版本:

[root@tensorflow1 ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

内核版本: 

[root@tensorflow1 ~]# cat /proc/version
Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017

软件版本:

kubernetes v1.9.0
docker:17.03.2-ce
kubeadm:v1.9.0
kube-apiserver:v1.9.0
kube-controller-manager:v1.9.0
kube-scheduler:v1.9.0
k8s-dns-sidecar:1.14.7
k8s-dns-kube-dns:1.14.7
k8s-dns-dnsmasq-nanny:1.14.7
kube-proxy:v1.9.0
etcd:3.1.10
pause :3.0
flannel:v0.9.1
kubernetes-dashboard:v1.8.1

采用kubeadm安装
kubeadm为kubernetes官方推荐的自动化部署工具,他将kubernetes的组件以pod的形式部署在master和node节点上,并自动完成证书认证等操作。
因为kubeadm默认要从google的镜像仓库下载镜像,但目前国内无法访问google镜像仓库,所以这里我提交将镜像下好了,只需要将离线包的镜像导入到节点中就可以了。
开始安装
下载
链接: https://pan.baidu.com/s/1c2O1gIW 密码: 9s92

比对md5解压离线包

MD5 (k8s_images.tar.bz2) = b60ad6a638eda472b8ddcfa9006315ee

解压下载下来的离线包

tar -xjvf k8s_images.tar.bz2

所有节点操作

环境设置

绑定hosts
将节点ip和host写入hosts文件

[root@tensorflow1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.137 tensorflow0
192.168.1.138 tensorflow1
192.168.1.139 tensorflow2

关闭防火墙

systemctl stop firewalld && systemctl disable firewalld

关闭selinux

修改vi /etc/selinux/config 文件,将SELINUX改为disabled
SELINUX=disabled

setenforce 0

关闭swap

swapoff -a
设置永久关闭swap

修改/etc/fstab中内容,将swap那一行用#注释掉。

配置系统路由参数,防止kubeadm报路由警告

echo "
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
" >> /etc/sysctl.conf
sysctl -p

安装docker
安装docker-ce17.03 (kubeadmv1.9最大支持docker-ce17.03)

rpm -ihv docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch.rpm
rpm -ivh docker-ce-17.03.2.ce-1.el7.centos.x86_64.rpm

启动docker-ce

systemctl start docker && systemctl enable docker

检查docker服务

systemctl status docker
active (running)则正常

安装k8s
导入镜像

docker load <etcd-amd64_v3.1.10.tar
docker load <flannel\:v0.9.1-amd64.tar
docker load <k8s-dns-dnsmasq-nanny-amd64_v1.14.7.tar
docker load <k8s-dns-kube-dns-amd64_1.14.7.tar
docker load <k8s-dns-sidecar-amd64_1.14.7.tar
docker load <kube-apiserver-amd64_v1.9.0.tar
docker load <kube-controller-manager-amd64_v1.9.0.tar
docker load <kube-scheduler-amd64_v1.9.0.tar
docker load <kube-proxy-amd64_v1.9.0.tar
docker load <pause-amd64_3.0.tar
docker load <kubernetes-dashboard_v1.8.1.tar

注意kubernetes-dashboard_v1.8.1.tar与其他包不在同一个目录下,在上一级目录中

安装kubelet kubeadm kubectl包

rpm -ivh socat-1.7.3.2-2.el7.x86_64.rpm
rpm -ivh kubernetes-cni-0.6.0-0.x86_64.rpm kubelet-1.9.9-9.x86_64.rpm kubectl-1.9.0-0.x86_64.rpm
rpm -ivh kubectl-1.9.0-0.x86_64.rpm
rpm -ivh kubeadm-1.9.0-0.x86_64.rpm

修改kublet配置文件

查看docker cgroup driver:

docker info|grep Cgroup

有systemd和cgroupfs两种,把kubelet service配置改成与docker一致

vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

修改 Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"。修改 systemd为cgroupfs

启动kubelet

systemctl enable kubelet && sudo systemctl start kubelet

检查kubelet服务

systemctl status kubelet

kubelet启动后 ,不停重启、ca文件不存在是正常现象,在后续步骤 kubeadm init执行后会生成ca文件,就会正常运行。

The kubelet is now restarting every few seconds, as it waits in a crashloop for kubeadm to tell it what to do. This crashloop is expected and normal, please proceed with the next step and the kubelet will start running normally.

master节点操作
开始初始化master
kubeadm init --kubernetes-version=v1.9.0 --pod-network-cidr=10.244.0.0/16
kubernetes默认支持多重网络插件如flannel、weave、calico,这里使用flanne,就必须要设置--pod-network-cidr参数,10.244.0.0/16是kube-flannel.yml里面配置的默认网段,如果需要修改的话,需要把kubeadm init的--pod-network-cidr参数和后面的kube-flannel.yml里面修改成一样的网段就可以了。

将kubeadm join xxx保存下来,等下node节点加入集群需要使用

eg:

kubeadm join --token 5ce44e.47b6dc4e4b66980f 192.168.1.138:6443 --discovery-token-ca-cert-hash sha256:9d7eac82d66744405c783de5403e1f2bb7191b4c1b350d721b7b8570c62ff83a

如果忘记了,可以在master上通过kubeadmin token list得到

kubeadmin token list

默认token 24小时就会过期,后续的机器要加入集群需要重新生成token

kubeadm token create
sha256获取方式 master节点执行:
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

按照上面提示,此时root用户还不能使用kubelet控制集群需要,配置下环境变量

对于非root用户

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
对于root用户

可以直接放到~/.bash_profile

echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source一下环境变量

source ~/.bash_profile

kubectl version测试

kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T20:55:30Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

安装网络,可以使用flannel、calico、weave、macvlan这里我们用flannel。直接使用离线包里面的。

若要修改网段,需要kubeadm --pod-network-cidr=和这里同步

vi kube-flannel.yml

修改network项

"Network": "10.244.0.0/16",

执行

kubectl create -f kube-flannel.yml

node节点操作
使用刚刚kubeadm后的kubeadm join

kubeadm join --token 5ce44e.47b6dc4e4b66980f 192.168.1.138:6443 --discovery-token-ca-cert-hash sha256:9d7eac82d66744405c783de5403e1f2bb7191b4c1b350d721b7b8570c62ff83a

在master节点上确认一下

[root@tensorflow1 hadoop]# kubectl get nodes
NAME          STATUS    ROLES     AGE       VERSION
tensorflow0   Ready     <none>    1d        v1.9.0
tensorflow1   Ready     master    1d        v1.9.0
tensorflow2   Ready     <none>    1d        v1.9.0

kubernetes会在每个node节点创建flannel和kube-proxy的pod

[root@tensorflow1 hadoop]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                   READY     STATUS    RESTARTS   AGE
kube-system   etcd-tensorflow1                       1/1       Running   0          1d
kube-system   kube-apiserver-tensorflow1             1/1       Running   0          1d
kube-system   kube-controller-manager-tensorflow1    1/1       Running   0          1d
kube-system   kube-dns-6f4fd4bdf-59ttf               3/3       Running   0          1d
kube-system   kube-flannel-ds-fb75p                  1/1       Running   0          1d
kube-system   kube-flannel-ds-ppm2t                  1/1       Running   0          1d
kube-system   kube-flannel-ds-w54wh                  1/1       Running   0          1d
kube-system   kube-proxy-4lftj                       1/1       Running   0          1d
kube-system   kube-proxy-cj4st                       1/1       Running   0          1d
kube-system   kube-proxy-kd7vb                       1/1       Running   0          1d
kube-system   kube-scheduler-tensorflow1             1/1       Running   0          1d

至此kubernetes基本集群安装完成。

--后续补充部署dashboard内容--

安装nvidia-gpu组件
这里是为了给容器使用gpu,需要安装组件,否则不用安装。

编辑/etc/docker/daemon.json

cat /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

重启docker

systemctl restart docker

编辑kubelet配置文件

资源则需要增加一行 Environment="KUBELET_EXTRA_ARGS=--feature-gates=DevicePlugins=true"  注意要加在 ExecStart= 之前

重启kubelet

systemctl daemon-reload && systemctl restart kubelet

根据gpu型号下载相应的gpu插件镜像,挂载镜像

docker load < 
# docker images
REPOSITORY                                               TAG                 IMAGE ID            CREATED             SIZE
nvidia/k8s-device-plugin                                 1.9                 3325c3b04513        2 weeks ago         63 MB

通过yaml文件启动

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml
文件内容如下:
[root@tensorflow1 tf_gpu]# cat nvidia-device-plugin.yml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  template:
    metadata:
      # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
      # reserves resources for critical add-on pods so that they can be rescheduled after
      # a failure.  This annotation works in tandem with the toleration below.
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
      # This, along with the annotation above marks this pod as a critical add-on.
      - key: CriticalAddonsOnly
        operator: Exists
      containers:
      - image: nvidia/k8s-device-plugin:1.9
        name: nvidia-device-plugin-ctr
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
          - name: device-plugin
            mountPath: /var/lib/kubelet/device-plugins
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins



systemctl daemon-reload && systemctl restart kubelet

本文转自CSDN-离线安装k8s 1.9.0

上一篇:HTAP数据库 PostgreSQL 场景与性能测试之 20 - (OLAP) 用户画像圈人场景 - 多个字段任意组合条件筛选与透视


下一篇:python基础——单元测试