使用阿里云CSI插件实现云盘数据卷动态扩容

使用云盘存储卷,往往在服务初始化的时候申请了一个适当容量的云盘,但是随着数据的增长,数据盘的容量不能满足需求,需要扩容。

传统应用的扩容场景中,往往是先手动停掉应用,再对数据盘进行备份,然后执行扩容操作,最后重新启动应用。

Kubernetes本身是一个自动化调度、编排系统,实现了对数据卷的生命周期管理。在K8S 1.14中,CSI数据卷扩容属于Alpha阶段,需要开启Feature Gates才可以使用;

本文描述在CSI环境中如何进行云盘的动态扩容:

使用说明:

1. 数据备份:

切记:做数据卷扩容前,先对云盘打快照备份,以防扩容过程异常导致数据出现问题;

2. 集群依赖:

对云盘扩容操作需要调用云盘扩容相应API,所以需要集群具有此API的调用权限,可以参考集群权限文档为集群添加此权限;参考详细步骤。

3. 数据卷限制:

只有动态存储卷才可以进行数据卷动态扩容,即配置了StorageClassName的PV;

不支持InlineVolume类型(非PV、PVC方式)云盘数据卷扩容;

普通云盘类型不支持动态扩容,请参考使用手动扩容云盘方案;

3. 对StorageClass的要求:

PVC配置的StorageClass为阿里云云盘类型,provisioner为diskplugin.csi.alibabacloud.com;

StorageClass需要配置:AllowVolumeExpansion: True,ACK集群默认为True;

依赖准备

申请ACK集群(大于等于1.14版本)阿里云Kubernetes集群(申请集群时选择CSI存储插件);

1. 配置Feature Gate(针对K8S1.14集群):

由于在K8S 1.14中,resize还是Alpha的Feature,需要增加如下配置:

更新kube-controller-manager 添加Feature Gate:

/etc/kubernetes/manifests/kube-controller-manager.yaml

更新kubelet(如果节点较多,可以写脚本实现):

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload
service kubelet restart

feature gates:--feature-gates=ExpandCSIVolumes=true

2. 集群添加扩容权限:

给云盘扩容需要为集群的”Worker RAM 角色“添加ResizeDisk权限:

专有集群:
在集群 --> 管理 --> 集群资源 点击”Master RAM 角色“;编辑Ram权限,添加ResizeDisk如下图:

托管集群:
在集群 --> 管理 --> 集群资源 点击”Worker RAM 角色“;编辑Ram权限,添加ResizeDisk如下图:

使用阿里云CSI插件实现云盘数据卷动态扩容

3. resizer插件部署(针对K8S1.14集群):

参考以下模板:

kind: Service
apiVersion: v1
metadata:
  name: csi-resizer
  namespace: kube-system
  labels:
    app: csi-resizer
spec:
  selector:
    app: csi-resizer
  ports:
    - name: dummy
      port: 12345
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: csi-resizer
  namespace: kube-system
spec:
  serviceName: "csi-resizer"
  selector:
    matchLabels:
      app: csi-resizer
  template:
    metadata:
      labels:
        app: csi-resizer
    spec:
      tolerations:
      - operator: "Exists"
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: node-role.kubernetes.io/master
                operator: Exists
      priorityClassName: system-node-critical
      serviceAccount: admin
      hostNetwork: true
      containers:
        - name: csi-resizer
          image: registry.cn-hangzhou.aliyuncs.com/acs/csi-resizer:v0.3.0
          args:
            - "--v=5"
            - "--csi-address=$(ADDRESS)"
            - "--leader-election"
          env:
            - name: ADDRESS
              value: /socketDir/csi.sock
          imagePullPolicy: "Always"
          volumeMounts:
            - name: socket-dir
              mountPath: /socketDir/

        - name: csi-diskplugin
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
          image: registry.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.14.8.32-c77e277b-aliyun
          imagePullPolicy: "Always"
          args:
            - "--endpoint=$(CSI_ENDPOINT)"
            - "--v=5"
            - "--driver=diskplugin.csi.alibabacloud.com"
          env:
            - name: CSI_ENDPOINT
              value: unix://socketDir/csi.sock
          volumeMounts:
            - mountPath: /var/log/
              name: host-log
            - mountPath: /socketDir/
              name: socket-dir
            - name: etc
              mountPath: /host/etc

      volumes:
        - name: socket-dir
          emptyDir: {}
        - name: host-log
          hostPath:
            path: /var/log/
        - name: etc
          hostPath:
            path: /etc
  updateStrategy:
    type: RollingUpdate

云盘卷扩容:

1. 创建应用

创建nginx应用,并给Pod挂载一个20G的云盘数据卷,PVC、Deploy的模板如下:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-disk
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: alicloud-disk-ssd
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dynamic-create
  labels:
    app: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
        volumeMounts:
          - name: disk-pvc
            mountPath: "/data"
      volumes:
        - name: disk-pvc
          persistentVolumeClaim:
            claimName: pvc-disk

当前应用状态如下:

Pod挂载的云盘大小为20G;
# kubectl get pod
NAME                              READY   STATUS    RESTARTS   AGE
dynamic-create-857bd875b5-n82d4   1/1     Running   0          107s
# kubectl exec -ti dynamic-create-857bd875b5-n82d4 df | grep data
/dev/vdb        20511312   45080  20449848   1% /data


pvc、pv的大小都显示为20G;
# kubectl get pvc
NAME       STATUS   VOLUME                   CAPACITY   ACCESS MODES   STORAGECLASS        AGE
pvc-disk   Bound    d-wz9g8sl8dl1ks8hz2m82   20Gi       RWO            alicloud-disk-ssd   2m17s

# kubectl get pv
NAME                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS        REASON   AGE
d-wz9g8sl8dl1ks8hz2m82   20Gi       RWO            Delete           Bound    default/pvc-disk   alicloud-disk-ssd            2m15s

2. 云盘卷扩容:

扩容云盘执行下面命令:

# kubectl patch pvc pvc-disk -p '{"spec":{"resources":{"requests":{"storage":"30Gi"}}}}'

更新pvc大小,会驱动Resizer调用云盘api进行扩容,控制台可以检查云盘已经变成了30G,且pv的size也更新到30G;

# kubectl get pvc
NAME       STATUS   VOLUME                   CAPACITY   ACCESS MODES   STORAGECLASS        AGE
pvc-disk   Bound    d-wz9g8sl8dl1ks8hz2m82   20Gi       RWO            alicloud-disk-ssd   13m

# kubectl get pv
NAME                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS        REASON   AGE
d-wz9g8sl8dl1ks8hz2m82   30Gi       RWO            Delete           Bound    default/pvc-disk   alicloud-disk-ssd            13m

此时只完成了云盘的扩容,文件系统的扩容没有做,所以容器内的存储空间依然是20G;

# kubectl exec -ti dynamic-create-857bd875b5-n82d4 df /data
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/vdb        20511312 45080  20449848   1% /data

通过删除Pod触发文件系统扩容:

# kubectl delete pod dynamic-create-857bd875b5-n82d4
pod "dynamic-create-857bd875b5-n82d4" deleted

# kubectl get pod
NAME                              READY   STATUS    RESTARTS   AGE
dynamic-create-857bd875b5-4gng9   1/1     Running   0          38s

可见文件系统已经扩容到30G:
# kubectl exec -ti dynamic-create-857bd875b5-4gng9 df /data
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/vdb        30832548 45036  30771128   1% /data

以上步骤即完成了一个CSI环境下云盘扩容的步骤:

上一篇:云数据中心呼唤安全可控的运维和检测


下一篇:【阿里云新品发布·周刊】第6期:态势感知全新升级为:云安全中心!