Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

[](

)1. 架构关系图


?

集群架构通常可以通过来表征,其中节点表征组件,边表征交互关系,通过图结构可以直观把握集群的架构,形如下图:

?

Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

可通过形如下的数据结构描述:


{

    "nodes": [

        {

            "_id": "0ce0e913f6e5516846c654dbd81db6ecab1f684e",

            "name": "kube-apiserver",

            "description": "XXX VPC 内",

            "type": "managed component",

            "dependencies": {}

        },

        {

            "_id": "f0740d8bb67520857061a9b71d4a9e4fc50bfe3d",

            "name": "etcd",

            "description": "XXX VPC 内",

            "type": "managed component | storage",

            "dependencies": {}

        },

        {

            "_id": "05952a825e91cb50a81cbaf23c6941d5c3bb2c89",

            "name": "eni-operator",

            "description": "XXX VPC 内,管理 ENI",

            "type": "component",

            "dependencies": {

                "serviceaccount": "enioperator",

                "clusterrole": "enioperator",

                "clusterrolebinding": "enioperator",

                "configmaps": ["eniconfig"],

                "secrets": ["enioperator"]

            }

        },

        {

            "_id": "42699513a7561e89a5f99881d7b05653a1625c51",

            "name": "Network Service",

            "description": "提供 VPC/VSwitch 等云网络资源的管理服务",

            "type": "cloud service"

        }

    ],

    "edges": [

        {

            "_id": "38bce9ca8a0cec6d8586d96298bd63b0523fc946",

            "source": "eni-operator", "target": "kube-apiserver",

            "description": "管理 ENI 请求"

        },

        {

            "_id": "93f3c21247165f0be3a969fc80f72bc1a402e9f5",

            "source": "eni-operator", "target": "Network Service",

            "description": "访问阿里云 ECS OpenAPI,管理 VPC/VSwitch 等网络资源"

        }

    ]

}



?

[](

)2. 架构运行图


?

集群运行过程中,组件及交互关系可以通过外部观测数据推测内部状态,如 log/metrics/trace。与集群架构图结合,可以在静态架构的基础上叠加动态的洞察数据,更直观把握集群的健康状态,如下图:

?

Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

其中的数字表征洞察数据,可以是「异常数量」「请求流量」等。除了通过数字进行洞察,还可以使用「颜色表征健康状态」「线条粗细表征流量大小」等。

可通过形如下的数据结构描述:


{

    "nodes": [

      {

            "_id": "ea4538dc0625d06b0dc93579998e04288656050f",

            "name": "mutatehook",

            "deploy": {

                "type": "K8s:Deployment",

                "namespace": "kube-system",

                "replicas": 3

            },

            "insight": [

                {

                    "source": {

                        "vendor": "cloud:aliyun:sls",

                        "log_project": "xxx",

                        "log_store": "mutatehook",

                        "log_url": "https://sls.console.aliyun.com/lognext/project/xxx"

                    },

                    "signal": {

                        "exception": {

                            "fuzzy": "fail OR Fail OR error OR Error"

                        }

                    }

              }

          ]

      }

    ],

    "edges": [

        {

            "_id": "38bce9ca8a0cec6d8586d96298bd63b0523fc946",

            "source": "eni-operator", "target": "kube-apiserver",

            "insight":[

                {

                    "source": {

                        "vendor": "cloud:aliyun:sls",

                        "log_project": "xxx",

                        "log_store": "xxx",

                        "log_url": "https://sls.console.aliyun.com/lognext/project/xxx"

                    },

                    "signal": {

                        "exception": {

                            "unauthorized": "Unauthorized",

                            "throttling": "'Throttling' OR 'throttling'"

                        }

                    }

                }

            ]

        }

    ]

}



?

[](

)3. 资源构成图


资源管理是个复杂的话题,通过分析集群中资源的构成关系,也可以尝试通过结构来表征集群的资源构成,节点表征资源,边表征资源的从属或绑定关系。

?

可通过形如下的数据结构描述:


{

    "kinds": ["vpc", "vswitch", "securitygroup", "ecs", "clb", "rds", "nat", "eip"],

    "tags": {

        "cluster/product": "xxx",

        "cluster/id": "2736f42d4e882ad6825d6364545a3f1cb5136859",

        "cluster/name": "xxx",

        "cluster/env": "staging"

    },

    "nodes": [

        {

            "kind": "vpc",

            "nodes": [

                {

                    "_id": "c505f21871bac7385c1387988cf226310af0831e",

                    "id": "vpc-xxx",

                    "description": "",

                    "ipv4": "xxx",

                    "tags": {

                        "resource/creator": "product",

                        "resource/role": ""

                     },

                     "url": "https://vpc.console.aliyun.com/vpc/xxx"

                }

            ]

        },

        {

            "kind": "ecs",

            "nodes": [

                {

                    "_id": "47c4fe5cc2585a49f07798a0b8b69cda7f8d4a23",

                    "id": "xxx",

                    "az": "xxx",

                    "interfaces": {

                        "primary": {

                            "ip": "xxx",

                            "eni": "xxx",

                            "mac": "xxx"

                        }

                    },

                    "instance-type-family": "xxx",

                    "instance-type": "xxx",

                    "tags": {

                        "resource/creator": "product",

                        "resource/role": "worker",

                        "node/container-runtime": "xxx",

                        "node/user-networking": "xxx",

                        "node/system-networking": "xxx"

                    },

                    "status": "",

                    "condition": "",

                    "url": "https://ecs.console.aliyun.com/#/server/xxx"

                }

            ]

        }

    ],

    "edges": [

        {

            "_id": "a754c748b2723a25c017421dd0969d00df3c000b",

            "source": "vsw-xxx", "target": "vpc-xxx",

            "description": ""

        },

        {

            "_id": "c34b164eba2897cfb2b574a576672d8aa441d709",

            "source": "eip-xxx", "target": "ngw-xxx",

            "description": ""

        }

    ]

}



?

[](

)4. 资源运行图


?

资源使用过程中,也可以对资源及资源间的关系通过外部观测数据推测内部状态,如 log/metrics/event。与资源构成图结合,可以在静态资源的基础上叠加动态的洞察数据,直观把握集群资源的使用状态。

?

可通过形如下的数据结构描述:


{

    "nodes": [

         {

            "_id": "35103ac62d4ef0a314e2a5128f44c684205bea2f",

            "id": "vpc",

            "insight": [

                {

                    "source": {

                        "vendor": "cloud:aliyun:vpc",

                        "type": "OpenAPI"

                    },

                    "signal": {

                        "vpc/exist": "DescribeVpcs",

                        "vswitch/count": "DescribeVSwitches"

                    }

                },

                {

                    "source": {

                        "vendor": "cloud:aliyun:ecs",

                        "type": "OpenAPI"

                    },

                    "signal": {

                        "ecs/count": "DescribeInstances",

                        "securitygroup/count": "DescribeSecurityGroups"

                    }

                }

            ]

        },

        {

            "_id": "6450e07dc67027f76f29fbfcb841e57200855196",

            "id": "ecs",

            "insight": [

                {

                    "source": {

                        "vendor": "cloud:aliyun:ecs",

                        "type": "OpenAPI"

                    },

                    "signal": {

                        "ecs/exist": "DescribeInstances",

                        "ecs/count": "DescribeInstances",

                        "ecs/usage": "DescribeInstanceMonitorData"

                    }

                },

                {

                    "source": {

                        "vendor": "cloud:aliyun:ecs",

                        "type": "auto"

                    },

                    "signal": {

                        "ecs/state_change": ""

                    }

                }

            ]

        }

    ],

    "edges": [

        {

            "_id": "caa1e395c713f47766ca7bcfc20419c0be0f0803",

            "source": "i-xxx", "target": "sg-xxx",

            "insight": [

                {

                    "source": {

                        "vendor": "cloud:aliyun:ecs",

                        "type": "OpenAPI"

                    },

                    "signal": {

                        "exist": "DescribeInstances"

                    }

                }

            ]

        },

        {

            "_id": "537dc478d95714792b3694674d6164f72b361bb0",

            "source": "eip-xxx", "target": "ngw-xxx",

            "insight": [

                {

                    "source": {

                        "vendor": "cloud:aliyun:vpc",

                        "type": "OpenAPI"

                    },

                    "signal": {

                        "exist": "DescribeEipAddresses"

                    }

                }

            ]

        }

    ]

}



[](

)预案

=======================================================================

最后

由于篇幅限制,小编在此截出几张知识讲解的图解

CodeChina开源项目:【一线大厂Java面试题解析+核心总结学习笔记+最新讲解视频】

Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

Kubernetes 稳定性保障手册:洞察(1),新鲜出炉的蚂蚁金服面经

上一篇:ArcGIS Server 9.2 on Solaris/Linux 安装可能出现的问题


下一篇:Mybatis配置文件头