背景说明
Kubernetes 通过 csi external-snapshotter 来做到对云盘快照的支持, 官方只支持最基本的快照的创建及删除。
ACK 通过安装 storage-auto-snapshotter 组件来使用云盘的定时快照功能
事前部署
部署 csi-snapshotter
首先我们需要部署 csi-snapshotter 来支持基本快照的创建,需要确认当前 ACK 集群的版本
- ACK集群版本 >= 1.18 , 在集群创建的时候就已经部署好了 csi-snapshotter, 无需进行额外部署
- ACK集群版本 < 1.18 参考如下文章进行部署 https://developer.aliyun.com/article/757325.
部署 storage-auto-snapshotter 插件
- 使用
kubectl apply -f deployment.yaml
命令创建 deployment。
apiVersion: apps/v1
kind: Deployment
metadata:
name: storage-auto-snapshotter
namespace: kube-system
labels:
app: storage-auto-snapshotter
spec:
selector:
matchLabels:
app: storage-auto-snapshotter
template:
metadata:
labels:
app: storage-auto-snapshotter
spec:
tolerations:
- operator: "Exists"
priorityClassName: system-node-critical
serviceAccount: admin
hostNetwork: true
hostPID: true
containers:
- name: storage-auto-snapshotter
image: registry.cn-beijing.aliyuncs.com/gyq193577/csi_auto_snapshotter:v1.16.6-9268802
imagePullPolicy: Always
env:
- name: SNAPSHOT_CLASS
value: ""
volumeMounts:
- name: date-config
mountPath: /etc/localtime
volumes:
- name: date-config
hostPath:
path: /etc/localtime
- 使用
kubectl get pods -nkube-system | grep storage-auto-snapshotter | grep Running
判断插件是否正常启动
部署 VolumeSnapshotPolicy CRD
- 使用
kubectl create -f volumesnapshotcrd.yaml
创建 CRDapiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: name: volumesnapshotpolicies.storage.alibabacloud.com spec: group: storage.alibabacloud.com versions: - name: v1alpha1 served: true storage: true schema: openAPIV3Schema: description: VolumeSnapshotPolicy is the Schema for the VolumeSnapshotPolicy API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds' type: string metadata: type: object spec: description: VolumeSnapshotPolicySpec defines the desired Specification of VolumeSnapshotPolicy properties: retentionDays: description: retentionDays is days to save snapshot format: int64 type: integer repeatWeekdays: description: RepeatWeekdays is a list of days in a week to create disk snapshot type: array items: type: string timePoints: description: TimePoints is a list of hours in a day to create disk snapshot type: array items: type: string type: object type: object subresources: status: {} scope: Cluster names: kind: VolumeSnapshotPolicy plural: volumesnapshotpolicies shortNames: - vsp
- 使用
kubectl get crd volumesnapshotpolicies.storage.alibabacloud.com
检查 crd 是否已经正确创建
添加权限
- 托管版(标准托管版 & ACK Pro)无需添加权限。
- 专有版 ACK 需要在 ram worker role 上添加如下权限。
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "ecs:DescribeInstances", "ecs:CreateAutoSnapshotPolicy", "ecs:DeleteAutoSnapshotPolicy", "ecs:DescribeSnapshots", "ecs:ApplyAutoSnapshotPolicy", "ecs:ModifyAutoSnapshotPolicy", "ecs:DescribeAutoSnapshotPolicyEX" ], "Resource": [ "*" ], "Condition": {} } ] }
定时快照功能使用
storage-operator deployment 启动之后, 系统会检查当前cluster中是否存在 VolumeSnapshotPolicy, 如果存在, 继续对比当前创建的实例是否存在于 ecs 系统中, 如果存在,则跳过, 不存在则创建。
创建 VolumeSnapshotPolicy 实例
apiVersion: v1
items:
- apiVersion: storage.alibabacloud.com/v1alpha1
kind: VolumeSnapshotPolicy
metadata:
name: volumesnapshotpolicy1
spec:
retentionDays: 1
repeatWeekdays: ["1", "2"]
timePoints: ["11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23"]
kind: List
metadata:
resourceVersion: ""
selfLink: ""
该实例代表一个 ecs 上的自动快照策略,用户在 Kubernetes 上创建一个上面的实例,系统会自动在用户对应 ECS 服务上创建 自动快照策略,下面介绍下 spec
字段意义
字段名称 | 意义 |
---|---|
retentionDays | 自动快照创建保留天数 -1 为永久保存 |
repeatWeekdays | 一周内自动创建快照的时间点(天) |
timePoints | 一天内自动创建快照的时间点(小时) |
创建 pvc/pv, 并为 pvc 设置自动快照生成策略
- 通过 给 pvc 设置 annotations 来关联快照策略
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: csi-pvc-snapshot-policy annotations: policy.volumesnapshot.csi.alibabacloud.com: volumesnapshotpolicy1 # 这里需要将 pvc 与上一步创建出来的 volumesnapshotpolicy 相关联 spec: accessModes: - ReadWriteOnce resources: requests: storage: 25Gi selector: matchLabels: alicloud-pvname: static-disk-pv-snapshot-policy --- apiVersion: v1 kind: PersistentVolume metadata: name: csi-pv-snapshot-policy labels: alicloud-pvname: static-disk-pv-snapshot-policy spec: capacity: storage: 25Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain csi: driver: diskplugin.csi.alibabacloud.com volumeHandle: <your-disk-id>
注意,这时 storage-auto-snapshot 并不会将 pvc 绑定的云盘关联到自动关联的策略上。因为这时云盘还没有任何数据,没有必要创建快照造成资金损失。
创建 pod 关联这个 pvc/pv
- 只有当云盘被pod挂载之后,自动快照策略才开始生效
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web-policy
spec:
selector:
matchLabels:
app: nginx-policy
serviceName: "nginx"
template:
metadata:
labels:
app: nginx-policy
spec:
containers:
- name: nginx-policy
image: nginx
ports:
- containerPort: 80
name: web
volumeMounts:
- name: pvc-disk
mountPath: /data
volumes:
- name: pvc-disk
persistentVolumeClaim:
claimName: csi-pvc-snapshot-policy
当pod启动之后,storage-operator 会自动将 pv 对应的 diskId 与 VolumeSnapshotPolicy 进行关联,并按照策略进行快照生成。
查看自动快照策略是否生效
- 登录 ecs 主页面
- 点击 存储与快照 页面
- 点击 自动快照策略 页面
- 查看快照策略是否关联上了指定云盘
修改定时快照策略
注意
- 修改定时快照策略会影响该策略关联的所有云盘,请谨慎修改
- 不要在 ecs 页面上进行策略修改,所有的修改请通过 crd 进行修改
- 通过修改 volumesnapshotpolicy crd 进行快照策略的变更
$ kubectl edit volumesnapshotpolicy volumesnapshotpolicy1
```
```
apiVersion: v1
items:
- apiVersion: storage.alibabacloud.com/v1alpha1
kind: VolumeSnapshotPolicy
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"storage.alibabacloud.com/v1alpha1","kind":"VolumeSnapshotPolicy","metadata":{"annotations":{},"name":"volumesnapshotpolicy1"},"spec":{"repeatWeekdays":["1","2"],"retentionDays":1,"timePoints":["11","12","13","14","15","16","17","18","19","20","21","22","23"]}}
policyId: sp-uf6ahkkav6016ondbiyk
creationTimestamp: "2021-01-05T11:13:52Z"
generation: 2
managedFields:
- apiVersion: storage.alibabacloud.com/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
.: {}
f:retentionDays: {}
f:timePoints: {}
manager: kubectl-client-side-apply
operation: Update
time: "2021-01-05T11:13:52Z"
- apiVersion: storage.alibabacloud.com/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:repeatWeekdays: {}
manager: kubectl-edit
operation: Update
time: "2021-01-06T07:16:59Z"
- apiVersion: storage.alibabacloud.com/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:policyId: {}
manager: operator
operation: Update
time: "2021-01-06T07:16:59Z"
name: volumesnapshotpolicy1
resourceVersion: "339669"
selfLink: /apis/storage.alibabacloud.com/v1alpha1/volumesnapshotpolicies/volumesnapshotpolicy1
uid: 02257b4a-28e6-46f4-a767-81c0d117aba0
spec:
repeatWeekdays:
- "1"
- "2"
- "3"
- "4"
retentionDays: 1
timePoints:
- "11"
- "12"
- "13"
- "14"
- "15"
- "16"
- "17"
- "18"
- "19"
- "20"
- "21"
- "22"
- "23"
kind: List
metadata:
resourceVersion: ""
selfLink: ""
```
- 在 ecs 页面上观察定时快照策略是否已更新.
### 使用定时快照生成的快照进行磁盘恢复
---
- 绑定定时快照策略之后, 用户会在 ack 集群中看到自动创建的快照(volumesnapshot & volumesnapshotcontent)
```
$ kubectl get volumesnapshot
NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE
s-uf6221xxxxxxxxxxx true 41943040 Delete diskplugin.csi.alibabacloud.com default-snapclass s-uf622145z6iibqtlrbwi 7m40s
s-uf65y0zxxxxxxxxx true 41943040 Delete diskplugin.csi.alibabacloud.com default-snapclass s-uf65y0zrhwsd581q60mg 7m40s
s-uf6a83xxxxxxxxxx true 41943040 Delete diskplugin.csi.alibabacloud.com default-snapclass s-uf6a83009o5s2jgcch9f 7m40s
s-uf6fmpbyrxxxxxxx true 41943040 Delete diskplugin.csi.alibabacloud.com default-snapclass s-uf6fmpbyrlm10amjicha 7m40s
- 这时我们就可以使用任意一个 volumesnapshot 来进行云盘的恢复
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web-restore
spec:
selector:
matchLabels:
app: nginx
serviceName: "nginx"
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
hostNetwork: true
containers:
- name: nginx
image: nginx
command: ["sh", "-c"]
args: ["sleep 10000"]
volumeMounts:
- name: disk-ssd
mountPath: /data
volumeClaimTemplates:
- metadata:
name: disk-ssd
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: alicloud-disk-ssd
resources:
requests:
storage: 20Gi
dataSource:
name: s-uf6221xxxxxxxxxxx
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
- 等待pod启动之后,我们就完成了定时快照中数据的恢复