基于GPU的指标扩缩容
在深度学习训练中,训练完成的模型,通过Serving服务提供模型服务。本文介绍如何构建弹性自动伸缩的Serving服务。
Kubernetes 支持HPA模块进行容器伸缩,默认支持CPU和内存等指标。原生的HPA基于Heapster,不支持GPU指标的伸缩,但是支持通过CustomMetrics的方式进行HPA指标的扩展。我们可以通过部署一个基于Prometheus Adapter 作为CustomMetricServer,它能将Prometheus指标注册的APIServer接口,提供HPA调用。 通过配置,HPA将CustomMetric作为扩缩容指标, 可以进行GPU指标的弹性伸缩。
前提
您需要创建一个容器服务Kubernets集群,并完成GPU监控部分的部署 阿里云容器Kubernetes监控- GPU监控, 完成部署Promethues用于监控GPU使用指标,我们将通过Prometheus 里的监控数据作为参考指标进行弹性伸缩。
注意
当HPA配置自定义监控指标进行伸缩指标后, 将无法使用原生HPA基于Heapster的CPU和Memory的伸缩。
部署
登录master上执行脚本,生成Prometheus Adapter的证书
#!/usr/bin/env bash
set -e
set -o pipefail
set -u
b64_opts='--wrap=0'
# go get -v -u github.com/cloudflare/cfssl/cmd/...
export PURPOSE=metrics
openssl req -x509 -sha256 -new -nodes -days 365 -newkey rsa:2048 -keyout ${PURPOSE}-ca.key -out ${PURPOSE}-ca.crt -subj "/CN=ca"
echo '{"signing":{"default":{"expiry":"43800h","usages":["signing","key encipherment","'${PURPOSE}'"]}}}' > "${PURPOSE}-ca-config.json"
export SERVICE_NAME=custom-metrics-apiserver
export ALT_NAMES='"custom-metrics-apiserver.monitoring","custom-metrics-apiserver.monitoring.svc"'
echo "{\"CN\":\"${SERVICE_NAME}\", \"hosts\": [${ALT_NAMES}], \"key\": {\"algo\": \"rsa\",\"size\": 2048}}" | \
cfssl gencert -ca=metrics-ca.crt -ca-key=metrics-ca.key -config=metrics-ca-config.json - | cfssljson -bare apiserver
cat <<-EOF > cm-adapter-serving-certs.yaml
apiVersion: v1
kind: Secret
metadata:
name: cm-adapter-serving-certs
data:
serving.crt: $(base64 ${b64_opts} < apiserver.pem)
serving.key: $(base64 ${b64_opts} < apiserver-key.pem)
EOF
kubectl -n kube-system apply -f cm-adapter-serving-certs.yaml
部署Prometheus CustomMetric Adapter
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
spec:
replicas: 1
selector:
matchLabels:
app: custom-metrics-apiserver
template:
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
spec:
serviceAccountName: custom-metrics-apiserver
containers:
- name: custom-metrics-apiserver
image: registry.cn-beijing.aliyuncs.com/test-hub/k8s-prometheus-adapter-amd64
args:
- --secure-port=6443
- --tls-cert-file=/var/run/serving-cert/serving.crt
- --tls-private-key-file=/var/run/serving-cert/serving.key
- --logtostderr=true
- --prometheus-url=http://prometheus-svc.kube-system.svc.cluster.local:9090/
- --metrics-relist-interval=1m
- --v=10
- --config=/etc/adapter/config.yaml
ports:
- containerPort: 6443
volumeMounts:
- mountPath: /var/run/serving-cert
name: volume-serving-cert
readOnly: true
- mountPath: /etc/adapter/
name: config
readOnly: true
- mountPath: /tmp
name: tmp-vol
volumes:
- name: volume-serving-cert
secret:
secretName: cm-adapter-serving-certs
- name: config
configMap:
name: adapter-config
- name: tmp-vol
emptyDir: {}
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: custom-metrics-apiserver
---
apiVersion: v1
kind: Service
metadata:
name: custom-metrics-apiserver
spec:
ports:
- port: 443
targetPort: 6443
selector:
app: custom-metrics-apiserver
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: custom-metrics-server-resources
rules:
- apiGroups:
- custom.metrics.k8s.io
resources: ["*"]
verbs: ["*"]
---
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
data:
config.yaml: |
rules:
- seriesQuery: '{uuid!=""}'
resources:
overrides:
node_name: {resource: "node"}
pod_name: {resource: "pod"}
namespace_name: {resource: "namespace"}
name:
matches: ^nvidia_gpu_(.*)$
as: "${1}_over_time"
metricsQuery: ceil(avg_over_time(<<.Series>>{<<.LabelMatchers>>}[3m]))
- seriesQuery: '{uuid!=""}'
resources:
overrides:
node_name: {resource: "node"}
pod_name: {resource: "pod"}
namespace_name: {resource: "namespace"}
name:
matches: ^nvidia_gpu_(.*)$
as: "${1}_current"
metricsQuery: <<.Series>>{<<.LabelMatchers>>}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: custom-metrics-resource-reader
rules:
- apiGroups:
- ""
resources:
- namespaces
- pods
- services
verbs:
- get
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: hpa-controller-custom-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: custom-metrics-server-resources
subjects:
- kind: ServiceAccount
name: horizontal-pod-autoscaler
namespace: kube-system
角色授权, 如果使用kube-system以外的命名空间, 需要修改模板中的namespace字段:
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
namespace: kube-system
spec:
service:
name: custom-metrics-apiserver
namespace: kube-system # 如果部署kube-system以外的Namespace 需要修改此处
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: custom-metrics-resource-reader
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: custom-metrics-resource-reader
subjects:
- kind: ServiceAccount
name: custom-metrics-apiserver
namespace: kube-system # 如果部署kube-system 以外的Namespace 需要修改此处
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: custom-metrics:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: custom-metrics-apiserver
namespace: kube-system # 如果部署kube-system 以外的Namespace 需要修改此处
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: custom-metrics-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: custom-metrics-apiserver
namespace: kube-system
部署完成后,可以通过customMetric的ApiServer调用,验证Prometheus Adapter部署成功
# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/temperature_celsius_current"
{"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/temperature_celsius_current"},"items":[]}
修改controller-manager配置,使用CustomMetric 作为hpa伸缩指标
登录到三个master上,分别执行脚本,修改ApiServer的HPA配置
sed -i 's/--horizontal-pod-autoscaler-use-rest-clients=false/--horizontal-pod-autoscaler-use-rest-clients=true/g' /etc/kubernetes/manifests/kube-controller-manager.yaml
检测修改结果
# kubectl -n kube-system describe po -l component=kube-controller-manager | grep 'horizontal-pod-autoscaler-use-rest-clients'
--horizontal-pod-autoscaler-use-rest-clients=true
--horizontal-pod-autoscaler-use-rest-clients=true
--horizontal-pod-autoscaler-use-rest-clients=true
伸缩指标
至此,我们已经部署了一个Prometheus 的CustomMetric Server, 我们通过adapter-config这个configMap配置Prometheus 提供暴露给ApiServer 的指标
支持以下GPU指标:
Prometheus指标 | 含义 | HPA指标 | HPA指标(3分钟平均值) |
---|---|---|---|
nvidia_gpu_duty_cycle | GPU使用率 | nvidia_gpu_duty_cycle_current | nvidia_gpu_duty_cycle_over_time |
nvidia_gpu_memory_total_bytes | GPU总内存 | nvidia_gpu_memory_total_bytes_current | nvidia_gpu_memory_total_bytes_over_time |
nvidia_gpu_memory_used_bytes | GPU已分配内存 | nvidia_gpu_memory_used_bytes_current | nvidia_gpu_memory_used_bytes_over_time |
nvidia_gpu_power_usage_milliwatts | GPU耗电量 | nvidia_gpu_power_usage_milliwatts_current | nvidia_gpu_power_usage_milliwatts_over_time |
nvidia_gpu_temperature_celsius | GPU温度 | temperature_celsius_current | temperature_celsius_over_time |
使用GPU指标进行自动伸缩
部署一个deployment
apiVersion: v1
kind: Service
metadata:
name: fast-style-transfer-serving
labels:
app: tensorflow-serving
spec:
ports:
- name: http-serving
port: 5000
targetPort: 5000
selector:
app: tensorflow-serving
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: fast-style-transfer-serving
labels:
app: tensorflow-serving
spec:
replicas: 1
template:
metadata:
labels:
app: tensorflow-serving
spec:
containers:
- name: serving
image: "registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/fast-style-transfer-serving:la_muse"
command: ["python", "app.py"]
resources:
limits:
nvidia.com/gpu: 1
创建一个基于GPU指标伸缩的HPA
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: gpu-hpa
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: fast-style-transfer-serving
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metricName: duty_cycle_current # 指标为pod的平均GPU使用率
targetAverageValue: 40
查看HPA的指标以及指标值
# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
gpu-hpa Deployment/fast-style-transfer-serving 0 / 40 1 10 1 37s
部署一个fast-style-transfer的压测应用
这个应用会不断向serving发送图片,用于模拟压力测试
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: fast-style-transfer-press
labels:
app: fast-style-transfer-press
spec:
replicas: 1
template:
metadata:
labels:
app: fast-style-transfer-press
spec:
containers:
- name: serving
image: "registry.cn-hangzhou.aliyuncs.com/xiaozhou/fast-style-transfer-press:v0"
env:
- name: SERVER_IP
value: fast-style-transfer-serving
- name: BATCH_SIZE
value: "100"
- name: TOTAL_SIZE
value: "12000"
压测部署完成后,可以在监控面板的【GPU应用监控】看到指标变化
也能够通过HPA看到指标变化
# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
sample-gpu-hpa Deployment/demo-service 63 / 30 1 10 1 3m
压测一段时间后可以看到pod扩容
NAME READY STATUS RESTARTS AGE
fast-style-transfer-press-69c48966d8-dqf5n 1/1 Running 0 4m
fast-style-transfer-serving-84587c94b7-7xp2d 1/1 Running 0 5m
fast-style-transfer-serving-84587c94b7-slbdn 1/1 Running 0 47s
监控界面也可以看到扩容的的pod以及GPU指标:
将压测容器停止
执行以下命令,将压测应用停止:
kubectl scale deploy fast-style-transfer-press --replicas=0 # 将压测应用容器缩容为0
(也可以在控制台上执行部署伸缩操作)
在HPA上检查dutyCycle指标变化为0
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
gpu-hpa Deployment/fast-style-transfer-serving 0 / 40 1 10 3 9m
一段时间后检查容器是否成功缩容
kubectl get po
NAME READY STATUS RESTARTS AGE
fast-style-transfer-serving-84587c94b7-7xp2d 1/1 Running 0 10m