本指南为Consul agent在K8s中的运行方式,Server端建议运行在物理机上。
Consul的安装方式请参考本人的另一篇博文Consul集群安装,这里不做过多描述。
本方案已在生产环境中经过验证,暂时没有发现使用问题。
Kubernetes中运行Consul agent的问题及应对方法
问题
- 业务如何去连接Consul agent。(Consul有一个特性为从哪台客户端注册的服务就要从哪台客户端反注销)。
- Consul agent启动的时候会根据主机名、IP等信息在data目录下生成自己的node-id等元数据。如果未持久化data目录,未使用主机网络,当Pod更新的时候,主机名和IP地址会改变。导致在Consul中出现同一个IP地址对应两个主机名的情况。服务注册就会出现问题。
- 第2中情况在生产环境中已遇到多次,同事更改了主机名称就导致在Consul集群中同一个IP对应两个主机名的情况。导致服务运行异常。
解决方法
- Consul-agent以DaemonSet的方式运行,通过使用主机网络(hostNetwork)。保持主机名和IP地址不变。将Consul的元数据持久化到宿主机的目录,这样Consul更新的时候,重新读取这个目录。不会重新生成node-id等元数据。
- 通过Deployment的环境变量注入的方式注入Consul agent的IP地址(即为物理机IP地址)。程序需要连接Consul的时候直接查找本机环境变量获取值即可连接。
配置
ConfigMap配置
~]# cat consul-client-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: consul-client
namespace: consul
data:
consul.json: |
{
"datacenter": "dc1",
"client_addr": "0.0.0.0",
"bind_addr": "{{ GetInterfaceIP \"eth0\" }}",
"data_dir": "/consul/data",
"retry_interval": "20s",
"retry_join": ["10.111.67.1","10.111.67.2","10.111.67.3","10.111.67.4","10.111.67.5"],
"enable_local_script_checks": true,
"log_file": "/var/log/",
"log_level": "trace",
"pid_file": "/var/run/consul.pid",
"performance": {
"raft_multiplier": 1
},
"telemetry": {
"prometheus_retention_time": "300s",
"disable_hostname": true
}
}
create-consul-registration.sh: |
#!/bin/sh
ADDR=`ip addr show|awk -F ‘[ /]+‘ ‘/eth[0-9]|em[0-9]/ && /inet/ {print $3}‘`
CONSUL_CONF_DIR=‘/consul/config‘
CONSUL_REDISTER_FILE="$CONSUL_CONF_DIR/consul-members-registration.json"
if [[ -n "$ADDR" && -d $CONSUL_CONF_DIR ]];then
cat > ${CONSUL_REDISTER_FILE} <<-EOF
{
"service": {
"id": "consul-${ADDR}",
"name": "consul-members",
"tags": [
"prometheus",
"client",
"consul-client"
],
"address": "${ADDR}",
"port": 8500,
"check": {
"http": "http://127.0.0.1:8500",
"interval": "60s"
}
}
}
EOF
else
echo "ip address is empty or the $CONSUL_CONF_DIR does not exist"
fi
- consul.json为Consul配置文件
- create-consul-registration.sh为生成服务自动注册脚本,主要用来监控Consul
Consul监控请参考Consul Prometheus监控
DaemonSet配置
~]# cat consul-client-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: consul-client
namespace: consul
labels:
app: consul
environment: prod
component: client
spec:
minReadySeconds: 60
revisionHistoryLimit: 10
selector:
matchLabels:
app: consul
environment: prod
commponent: client
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
namespace: consul
labels:
app: consul
environment: prod
commponent: client
spec:
containers:
- env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
name: consul-client
image: consul:1.5.1
imagePullPolicy: IfNotPresent
command:
- "consul"
- "agent"
- "-config-dir=/consul/config"
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
/consul/create-consul-registration.sh
consul reload
preStop:
exec:
command:
- /bin/sh
- -c
- consul leave
ports:
- name: http-api
hostPort: 8500
containerPort: 8500
protocol: TCP
- name: dns-tcp
hostPort: 8600
containerPort: 8600
protocol: TCP
- name: dns-udp
hostPort: 8600
containerPort: 8600
protocol: UDP
- name: server-rpc
hostPort: 8300
containerPort: 8300
protocol: TCP
- name: serf-lan-tcp
hostPort: 8301
containerPort: 8301
protocol: TCP
- name: serf-lan-udp
hostPort: 8301
containerPort: 8301
protocol: UDP
- name: serf-wan-tcp
hostPort: 8302
containerPort: 8302
protocol: TCP
- name: serf-wan-udp
hostPort: 8302
containerPort: 8302
protocol: UDP
volumeMounts:
- name: consul-config
mountPath: /consul/config/consul.json
subPath: consul.json
- name: consul-members
mountPath: /consul/create-consul-registration.sh
subPath: create-consul-registration.sh
- name: consul-data-dir
mountPath: /consul/data
- name: localtime
mountPath: /etc/localtime
livenessProbe:
tcpSocket:
port: 8500
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /v1/status/leader
port: 8500
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
resources:
requests:
memory: "1024Mi"
cpu: "1000m"
limits:
memory: "1024Mi"
cpu: "1000m"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
hostNetwork: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: consul-config
configMap:
name: consul-client
items:
- key: consul.json
path: consul.json
- name: consul-members
configMap:
name: consul-client
defaultMode: 0755
items:
- key: create-consul-registration.sh
path: create-consul-registration.sh
- name: consul-data-dir
hostPath:
path: /data/consul/data
type: DirectoryOrCreate
- name: localtime
hostPath:
path: /etc/localtime
type: File
- command指令为覆盖Consul默认启动参数
- lifecycle.postStart为启动后执行服务自注册脚本
- lifecycle.preStop为Consul停止之前要从Consul集群移除
- hostNetwork为使用宿主机网络名称空间
- volumes.name.localtime为使用物理机时区(默认镜像应该使用的是0时区)
Deployment配置
~]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: business
environment: prod
release: release
name: business
namespace: prod-platform
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: business
environment: prod
release: release
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: business
environment: prod
release: release
spec:
shareProcessNamespace: true
containers:
- env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: CONSUL_HTTP_ADDR
value: "$(HOST_IP):8500"
image: registry-vpc.cn-hangzhou.aliyuncs.com/prod/prod-business:v1
imagePullPolicy: Always
name: usercancel
ports:
- containerPort: 8999
- containerPort: 9988
livenessProbe:
tcpSocket:
port: 8999
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /health
port: 8999
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1024Mi"
cpu: "1000m"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: data-vol
mountPath: /logs
subPath: logs
- name: data-vol
mountPath: /coredump
subPath: coredump
- env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: registry-vpc.cn-hangzhou.aliyuncs.com/devops/filebeat:7.4.2-1
imagePullPolicy: IfNotPresent
name: filebeat
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: filebeat-config
mountPath: /usr/share/filebeat/filebeat.yml
subPath: filebeat.yml
- name: data-vol
mountPath: /logs
subPath: logs
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: data-vol
persistentVolumeClaim:
claimName: pvc-nas-prod-platform-business
- name: filebeat-config
configMap:
name: business
items:
- key: filebeat.yml
path: filebeat.yml
- containers.env.name.CONSUL_HTTP_ADDR对应的值为Consul的地址,服务只需要获取环境变量CONSUL_HTTP_ADDR就可获取。