快速创建一个具有弹性的容器集群(ACK+ECI)

阿里云容器服务Kubernetes版(Alibaba Cloud Container Service for Kubernetes,简称容器服务ACK)是全球首批通过Kubernetes一致性认证的服务平台,提供高性能的容器应用管理服务,支持企业级Kubernetes容器化应用的生命周期管理,让您轻松高效地在云端运行Kubernetes容器化应用。ACK包含了专有版Kubernetes(Dedicated Kubernetes)、托管版Kubernetes(Managed Kubernetes)、Serverless Kubernetes三种形态,本文我们将完成一个ACK托管集群的创建,同时,我们将把serverless kubernetes的能力以插件的方式装到这个ACK集群中,实现一个由ACK+ECI共同打造的弹性容器集群。


前提条件

  • 首次使用时,您需要开通容器服务ACK,并为其授权相应云资源的访问权限。
    1. 登录容器服务ACK开通页面
    2. 阅读并选中容器服务ACK服务协议
    3. 单击立即开通
    4. 登录容器服务管理控制台
    5. 容器服务需要创建默认角色页面,单击前往RAM进行授权进入云资源访问授权页面,然后单击同意授权。完成以上授权后,刷新控制台即可使用容器服务ACK。
  • 请提前在一个阿里云一个region创建好VPC,同时选择1个可用区,并在其中创建2个vswitch,一个给node使用,一个给pod使用。


创建ACK集群

创建一个ACK集群可以由ACK控制台,OpenAPI, Terraform 等多种方式来完成,这里我们将提供一个控制台创建集群的配置方法,同时也会提供一个terraform脚本让大家体验下阿里云上的基础架构即代码的能力。

方式一(控制台UI创建):

  1. 创建集群时涉及到的配置项及其解释,可参考:https://help.aliyun.com/document_detail/95108.html
  2. 集群配置
  • 集群名称:bj-workshop
  • 集群规格:Pro版
  • 地域:北京
  • 付费类型:按量付费
  • kubernetes版本:1.20.11
  • 容器运行时:docker

快速创建一个具有弹性的容器集群(ACK+ECI)

  • 选择专有网络
  • 网络插件: Terway
  • 虚拟交换机:node vswitch
  • pod虚拟交换机: pod vswitch
  • service cidr:192.168.0.0/16

快速创建一个具有弹性的容器集群(ACK+ECI)

  • 配置SNAT
  • APIServer:内网+slb.s1.small
  • 安全组:自动创建企业安全组

快速创建一个具有弹性的容器集群(ACK+ECI)

3. 节点池配置

  • 实例规格:ecs.g6e.xlarge
  • 数量:2
  • 系统盘:essd,40G
  • 数据盘:essd, 120G

快速创建一个具有弹性的容器集群(ACK+ECI)

操作系统:Alibaba Cloud Linux

密码:Just4Test

快速创建一个具有弹性的容器集群(ACK+ECI)


  1. 组件配置
  • ingress:nginx Ingress+私网+slb规格slb.s1.small
  • 存储:取消 “创建默认NAS文件系统和CNFS容器网络文件系统动态存储类型”

快速创建一个具有弹性的容器集群(ACK+ECI)

快速创建一个具有弹性的容器集群(ACK+ECI)

5. 确认配置,如果有依赖检查未通过,请先行检查原因并通过依赖检查,勾选服务协议,开始创建

快速创建一个具有弹性的容器集群(ACK+ECI)

6. 托管集群创建大概需要15分钟


方式二(terraform创建):

采用terraform进行ACK集群的创建,本示例中,我们直接由terraform创建vpc,vswitch,ack等全部依赖的云资源

  1. 安装terraform,参考:https://learn.hashicorp.com/tutorials/terraform/install-cli
  2. 将下列文件拷贝到工作目录中:

main.tf

## specify the cloud provider aliyun/alicloud version
terraform {
  required_providers {
    alicloud = {
      source  = "aliyun/alicloud"
      version = "1.141.0"
    }
  }
}
# If there is not specifying vpc_id, the module will launch a new vpc
resource "alicloud_vpc" "vpc" {
  count      = var.vpc_id == "" ? 1 : 0
  cidr_block = var.vpc_cidr
}

# According to the vswitch cidr blocks to launch several vswitches
resource "alicloud_vswitch" "vswitches" {
  count             = length(var.vswitch_ids) > 0 ? 0 : length(var.vswitch_cidrs)
  vpc_id            = var.vpc_id == "" ? join("", alicloud_vpc.vpc.*.id) : var.vpc_id
  cidr_block        = element(var.vswitch_cidrs, count.index)
  zone_id = element(var.zone_id, count.index)
}


# According to the vswitch cidr blocks to launch several vswitches
resource "alicloud_vswitch" "terway_vswitches" {
  count             = length(var.terway_vswitch_ids) > 0 ? 0 : length(var.terway_vswitch_cirds)
  vpc_id            = var.vpc_id == "" ? join("", alicloud_vpc.vpc.*.id) : var.vpc_id
  cidr_block        = element(var.terway_vswitch_cirds, count.index)
  zone_id = element(var.zone_id, count.index)
}

resource "alicloud_cs_managed_kubernetes" "k8s" {
  kube_config           = var.kube_config
  count                 = var.k8s_number
  # version can not be defined in variables.tf. Options: 1.18.8-aliyun.1|1.20.11-aliyun.1
  version               = "1.20.11-aliyun.1"
  # name_prefix           = "terraform_"
  name                  = "bj-workshop"
  is_enterprise_security_group = true
  cluster_spec = "ack.pro.small"
  worker_vswitch_ids    = length(var.vswitch_ids) > 0 ? split(",", join(",", var.vswitch_ids)): length(var.vswitch_cidrs) < 1 ? [] : split(",", join(",", alicloud_vswitch.vswitches.*.id))
  pod_vswitch_ids       = length(var.terway_vswitch_ids) > 0 ? split(",", join(",", var.terway_vswitch_ids)): length(var.terway_vswitch_cirds) < 1 ? [] : split(",", join(",", alicloud_vswitch.terway_vswitches.*.id))
  worker_instance_types = var.worker_instance_types
  worker_disk_category  = "cloud_essd"
  worker_disk_size      = 40
  worker_data_disks {
    category                     = "cloud_essd"
    size                         = "100" 
    encrypted                    = false
    performance_level            = "PL0" 
  }
  worker_number         = var.worker_number
  node_cidr_mask        = var.node_cidr_mask
  enable_ssh            = var.enable_ssh
  install_cloud_monitor = var.install_cloud_monitor
  cpu_policy            = var.cpu_policy
  proxy_mode            = var.proxy_mode
  password              = var.password
  service_cidr          = var.service_cidr

  dynamic "addons" {
      for_each = var.cluster_addons
      content {
        name                    = lookup(addons.value, "name", var.cluster_addons)
        config                  = lookup(addons.value, "config", var.cluster_addons)
      }
  }
#  runtime = {
#    name    = "docker"
#    version = "19.03.5"
#  }
}


variables.tf

# 引入阿里云 Terraform Provider
provider "alicloud" {
  # 填入想创建的 Region
  region     = "cn-beijing"
}


variable "k8s_number" {
  description = "The number of kubernetes cluster."
  default     =  1
}

variable "zone_id" {
    description = "The availability zones of vswitches."
    default = ["cn-beijing-h","cn-beijing-i","cn-beijing-j"]
}

# leave it to empty would create a new one
variable "vpc_id" {
  description = "Existing vpc id used to create several vswitches and other resources."
  default     = ""
}

variable "vpc_cidr" {
  description = "The cidr block used to launch a new vpc when 'vpc_id' is not specified."
  default     = "10.0.0.0/8"
}

# leave it to empty then terraform will create several vswitches
variable "vswitch_ids" {
  description = "List of existing vswitch id."
  type        = list(string)
  default     = []
}


variable "vswitch_cidrs" {
  description = "List of cidr blocks used to create several new vswitches when 'vswitch_ids' is not specified."
  type        = list(string)
  default     = ["10.1.0.0/16","10.2.0.0/16","10.3.0.0/16"]
}

variable "new_nat_gateway" {
  description = "Whether to create a new nat gateway. In this template, a new nat gateway will create a nat gateway, eip and server snat entries."
  default     = "true"
}

# 3 masters is default settings,so choose three appropriate instance types in the availability zones above.
# variable "master_instance_types" {
#   description = "The ecs instance types used to launch master nodes."
#   default     = ["ecs.n4.xlarge","ecs.n4.xlarge","ecs.sn1ne.xlarge"]
# }

variable "worker_instance_types" {
  description = "The ecs instance types used to launch worker nodes."
  default     = ["ecs.g6e.xlarge"]
  #default     = ["ecs.g5ne.2xlarge","ecs.sn1ne.xlarge","ecs.n4.xlarge"]
}

# options: between 24-28
variable "node_cidr_mask" {
    description = "The node cidr block to specific how many pods can run on single node."
    default = 24
}

variable "enable_ssh" {
    description = "Enable login to the node through SSH."
    default = true
}

variable "install_cloud_monitor" {
    description = "Install cloud monitor agent on ECS."
    default = true
}

# options: none|static
variable "cpu_policy" {
    description = "kubelet cpu policy.default: none."
    default = "none"
}

# options: ipvs|iptables
variable "proxy_mode" {
    description = "Proxy mode is option of kube-proxy."
    default = "ipvs"
}

variable "password" {
  description = "The password of ECS instance."
  default     = "Just4Test"
}

variable "worker_number" {
  description = "The number of worker nodes in kubernetes cluster."
  default     = 2
}

variable "service_cidr" {
  description = "The kubernetes service cidr block. It cannot be equals to vpc's or vswitch's or pod's and cannot be in them."
  default     = "172.21.0.0/20"
}

variable "terway_vswitch_ids" {
  description = "List of existing vswitch ids for terway."
  type        = list(string)
  default     = []
}

variable "terway_vswitch_cirds" {
  description = "List of cidr blocks used to create several new vswitches when 'terway_vswitch_ids' is not specified."
  type        = list(string)
  default     = ["10.4.0.0/16","10.5.0.0/16","10.6.0.0/16"]
}

variable "cluster_addons" {
    type = list(object({
        name      = string
        config    = string
    }))

    default = [
        {
            "name"     = "terway-eniip",  # terway 默认模式
            #"name"     = "terway-eni", # terway eni独享模式
            "config"   = "",
        },
        {
            "name"     = "csi-plugin",
            "config"   = "",
        },
        {
            "name"     = "csi-provisioner",
            "config"   = "",
        },
          {
            "name"     = "alicloud-disk-controller",
            "config"   = "",
        },
        {
            "name"     = "logtail-ds",
            "config"   = "{\"IngressDashboardEnabled\":\"true\"}",
        },
        {
            "name"     = "nginx-ingress-controller",
            "config"   = "{\"IngressSlbNetworkType\":\"internet\"}",
        },
          {
            "name"     = "arms-prometheus",
            "config"   = "",
        },
        {
            "name"     = "ack-node-problem-detector",
            "config"   = "",
        },
        {
            "name"     = "ack-kubernetes-cronhpa-controller", 
            "config"   = "",
        },
        {
            "name"     = "ack-node-local-dns", 
            "config"   = "",
        }
    ]
}

variable "kube_config" {
    description = "kubeconfig path since 1.105.0"
    default = "~/.kube/config-terraform"
}


  1. 配置阿里云cloud provider,并初始化项目
export ALICLOUD_ACCESS_KEY="xxxxxx"
export ALICLOUD_SECRET_KEY="xxxxxx"
export ALICLOUD_REGION="cn-hangzhou"
#init phase
    terraform init
#Planning phase
    terraform plan
#Apply phase
    terraform apply


  1. 删除创建云资源
#Destroy
terraform destroy


创建ECI虚拟节点

前提条件:

  1. ACK集群创建成功
  2. 需要开通弹性容器实例服务。登录弹性容器实例控制台开通相应的服务。
  3. 需要确认集群所在区域在ECI支持的地域列表内。登录弹性容器实例控制台查看已经支持的地域和可用区。


安装步骤:

  1. 登录ACK控制台,点击集群名称进入该集群
  2. 在“运维管理”中,进入"组件管理"

快速创建一个具有弹性的容器集群(ACK+ECI)

  1. 找到"ack-virtual-node"组件,点击"安装"

快速创建一个具有弹性的容器集群(ACK+ECI)

  1. 安装成功后,在集群“节点管理”中的节点列表中,可以看到多出现一个virtual-kubelet这样一个虚拟节点

快速创建一个具有弹性的容器集群(ACK+ECI)

给ingress Controller的私网SLB绑定一个公网eip

因为实验条件所限,前面我们没有直接给nginx ingress controller自动创建一个公网slb作为入口,但为了后续实验可以顺利进行,我们需要到slb的控制台上手动给nginx ingress controller的私网slb绑定一个eip,(如体验用户账户中有超过100元余额,可直接购买公网slb用于实验)

  1. 给ingress controller对应的slb关闭修改保护

快速创建一个具有弹性的容器集群(ACK+ECI)

  1. 在slb控制台上给该slb实例绑定eip


部署一个简单的应用(可选)

该示例中,将采用yaml部署一个deploy,并通过一个loadbalancer类型的service对外暴露服务,另外,该示例中,我们通过ECI profile将集群中不可调度的节点调度至ECI虚拟节点上

cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        alibabacloud.com/burst-resource: eci #选择弹性调度的资源类型,当前集群ECS资源不足时,使用ECI弹性资源。
    spec:
      containers:
        - image: 'registry-vpc.cn-beijing.aliyuncs.com/haoshuwei/nginx:latest'
          imagePullPolicy: Always
          name: nginx
          resources:
            limits:
              cpu: '4'
              memory: 8Gi
            requests:
              cpu: '4'
              memory: 8Gi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
  namespace: default
spec:
  ports:
    - name: '80'
      port: 80
      protocol: TCP
      targetPort: 80
  selector:
    app: nginx
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/service-weight: ''
  name: nginx-ingress
  namespace: default
spec:
  rules:
    - host: alibj.workshop.com
      http:
        paths:
          - backend:
              serviceName: nginx-svc
              servicePort: 80
            path: /
            pathType: ImplementationSpecific
EOF


验证:

  1. 使用kubectl命令来查看当前pod的调度情况,可以看到由于集群中资源不足以调度请求规格的pod,自动创建出了ECI的pod,可以看到pod位于虚拟节点上
kubectl get po -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP             NODE                            NOMINATED NODE   READINESS GATES
nginx-7797c877f-cfxjh   1/1     Running   0          74s   10.1.126.225   virtual-kubelet-cn-hangzhou-h   <none>           <none>
nginx-7797c877f-rjk4v   1/1     Running   0          74s   10.1.126.224   virtual-kubelet-cn-hangzhou-h   <none>           <none>

2. 登录到一台ECS worker节点上,用curl命令看下是否可以访问刚刚部署的nginx服务

curl -H "Host: alibj.workshop.com" http://ingress-controller-eip/ 


清理:

使用kubectl来删除上面的示例

kubectl delete deploy nginx
kubectl delete service nginx-svc
kubectl delete ingress nginx-ingress

小结:

至此,我们快速的完成了一个ACK集群的创建,同时为该集群添加了一个具有海量弹性扩展能力的ECI虚拟节点。

上一篇:Kafka消费者Consumer常用配置


下一篇:清华首招大数据硕士,培养“π”型人才