K8S Ingress Controller 健康检查原理剖析

健康检查配置

K8S Ingress Controller 健康检查原理剖析

我们知道K8S本身提供了LivenessReadiness机制来对Pod进行健康监控,同样我们在部署K8S Ingress Controller时也配置了LivenessProbe和ReadinessProbe对其进行健康检查,具体配置如下所示:

livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /healthz
    port: 10254
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1
readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /healthz
    port: 10254
    scheme: HTTP
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1

那么Kubelet在对Nginx Ingress Controller Pod进行定期健康检查时,就会通过HTTP协议发送GET请求,类似于如下请求:

curl -XGET http://<NGINX_INGRESS_CONTROLLER_POD_ID>:10254/healthz

健康检查成功则会返回ok,检查失败则返回失败信息。

原理剖析

那么当Kubelet发起对Ingress Controller Pod的健康检查时,Nginx Ingress Controller内部到底做了什么,以及为什么是10254端口和/healthz路径,今天我们简单剖析下K8S Ingress Controller内部的健康检查逻辑。

1、10254 和 /healthz

首先,Nginx Ingress Controller在启动时会通过goroutine启动一个HTTP Server:

// 初始化一个 HTTP Request Handler
mux := http.NewServeMux()
go registerHandlers(conf.EnableProfiling, conf.ListenPorts.Health, ngx, mux)

其中registerHandlers方法实现如下:

func registerHandlers(enableProfiling bool, port int, ic *controller.NGINXController, mux *http.ServeMux) {
    // 注册健康检查Handler
    healthz.InstallHandler(mux,
        healthz.PingHealthz,
        ic,
    )

    // 用于Prometheus抓取metrics信息
    mux.Handle("/metrics", promhttp.Handler())
    // 获取当前Ingress Controller版本信息
    mux.HandleFunc("/build", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
        b, _ := json.Marshal(version.String())
        w.Write(b)
    })
    // 主动停止Ingress Controller Pod
    mux.HandleFunc("/stop", func(w http.ResponseWriter, r *http.Request) {
        err := syscall.Kill(syscall.Getpid(), syscall.SIGTERM)
        if err != nil {
            glog.Errorf("Unexpected error: %v", err)
        }
    })

    // 获取性能监控信息
    if enableProfiling {
        mux.HandleFunc("/debug/pprof/", pprof.Index)
        mux.HandleFunc("/debug/pprof/heap", pprof.Index)
        mux.HandleFunc("/debug/pprof/mutex", pprof.Index)
        mux.HandleFunc("/debug/pprof/goroutine", pprof.Index)
        mux.HandleFunc("/debug/pprof/threadcreate", pprof.Index)
        mux.HandleFunc("/debug/pprof/block", pprof.Index)
        mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
        mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
        mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
        mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
    }
    // 启动 HTTP Server
    server := &http.Server{
        Addr:              fmt.Sprintf(":%v", port), // 指定监听Port
        Handler:           mux,
        ReadTimeout:       10 * time.Second,
        ReadHeaderTimeout: 10 * time.Second,
        WriteTimeout:      300 * time.Second,
        IdleTimeout:       120 * time.Second,
    }
    glog.Fatal(server.ListenAndServe())
}

从方法实现中,我们看到启动的HTTP Server监听的是conf.ListenPorts.Health端口;而该端口值是在Nginx Ingress Controller启动时通过如下启动参数解析而来:

httpPort      = flags.Int("http-port", 80, `Port to use for servicing HTTP traffic.`)
httpsPort     = flags.Int("https-port", 443, `Port to use for servicing HTTPS traffic.`)
statusPort    = flags.Int("status-port", 18080, `Port to use for exposing NGINX status pages.`)
sslProxyPort  = flags.Int("ssl-passthrough-proxy-port", 442, `Port to use internally for SSL Passthrough.`)
defServerPort = flags.Int("default-server-port", 8181, `Port to use for exposing the default server (catch-all).`)
healthzPort   = flags.Int("healthz-port", 10254, "Port to use for the healthz endpoint.")

因此,当我们在启动Nginx Ingress Controller时没有明确指定healthz-port参数,那么它的默认值就是10254端口。

另外,从上述方法中我们看到也注册了一个健康检查的Request Handler,其通过healthz.InstallHandler方法来完成注册:

func InstallHandler(mux mux, checks ...HealthzChecker) {
    // 如果没指定任何健康检查实现,那么默认仅注册PingHealthz实现
    if len(checks) == 0 {
        glog.V(5).Info("No default health checks specified. Installing the ping handler.")
        checks = []HealthzChecker{PingHealthz}
    }

    glog.V(5).Info("Installing healthz checkers:", strings.Join(checkerNames(checks...), ", "))
    // 注册健康检查根Handler,其内部会依次调用各个具体Handler实现
    mux.Handle("/healthz", handleRootHealthz(checks...))
    for _, check := range checks {
        // 注册各个具体的健康检查Handler实现
        mux.Handle(fmt.Sprintf("/healthz/%v", check.Name()), adaptCheckToHandler(check.Check))
    }
}

从这里我们看到注册的健康检查的请求根路径就是/healthz;当然我们也可以基于HealthzChecker接口来扩展更多的健康检查实现。

2、健康检查机制

通过前面章节我们看到当Kubelet对Nginx Ingress Controller Pod进行健康检查时,其最终会触发其内部handleRootHealthz 方法的执行:

func handleRootHealthz(checks ...HealthzChecker) http.HandlerFunc {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        failed := false
        var verboseOut bytes.Buffer
        for _, check := range checks {
            if err := check.Check(r); err != nil {
                // don't include the error since this endpoint is public.  If someone wants more detail
                // they should have explicit permission to the detailed checks.
                glog.V(6).Infof("healthz check %v failed: %v", check.Name(), err)
                fmt.Fprintf(&verboseOut, "[-]%v failed: reason withheld\n", check.Name())
                failed = true
            } else {
                fmt.Fprintf(&verboseOut, "[+]%v ok\n", check.Name())
            }
        }
        // always be verbose on failure
        if failed {
            http.Error(w, fmt.Sprintf("%vhealthz check failed", verboseOut.String()), http.StatusInternalServerError)
            return
        }

        if _, found := r.URL.Query()["verbose"]; !found {
            fmt.Fprint(w, "ok")
            return
        }

        verboseOut.WriteTo(w)
        fmt.Fprint(w, "healthz check passed\n")
    })
}

该方法内部会依次调用前面我们注册的各个具体的健康检查Handler实现,全部检查成功则会返回ok,任一检查失败则返回失败信息。

另外从前面代码中我们看到Nginx Ingress Controller在启动时会注册两个健康检查Handler:

  • healthz.PingHealthz

其是HealthzChecker接口的默认实现,实现逻辑很简单:

// PingHealthz returns true automatically when checked
var PingHealthz HealthzChecker = ping{}

// ping implements the simplest possible healthz checker.
type ping struct{}

func (ping) Name() string {
    return "ping"
}

// PingHealthz is a health check that returns true.
func (ping) Check(_ *http.Request) error {
    return nil
}
  • controller.NGINXController

其是Nginx Ingress Controller的具体代码实现,但同时其又实现了HealthzChecker接口来完成其所管理的资源必要的健康检查:

const (
    ngxHealthPath = "/healthz"
    nginxPID = "/tmp/nginx.pid"
)

func (n NGINXController) Name() string {
    return "nginx-ingress-controller"
}

func (n *NGINXController) Check(_ *http.Request) error {
    // 1.对Nginx进行健康检查,具体访问URL:http://0.0.0.0:18080/healthz
    res, err := http.Get(fmt.Sprintf("http://0.0.0.0:%v%v", n.cfg.ListenPorts.Status, ngxHealthPath))
    if err != nil {
        return err
    }
    defer res.Body.Close()
    if res.StatusCode != 200 {
        return fmt.Errorf("ingress controller is not healthy")
    }

    // 2. 若开启dynamic-configuration则检查Nginx维护在内存中的后端服务信息,访问URL:http://0.0.0.0:18080/is-dynamic-lb-initialized
    if n.cfg.DynamicConfigurationEnabled {
        res, err := http.Get(fmt.Sprintf("http://0.0.0.0:%v/is-dynamic-lb-initialized", n.cfg.ListenPorts.Status))
        if err != nil {
            return err
        }
        defer res.Body.Close()
        if res.StatusCode != 200 {
            return fmt.Errorf("dynamic load balancer not started")
        }
    }

    // 3. 检查Nginx主进程是否正常运行中
    fs, err := proc.NewFS("/proc")
    if err != nil {
        return errors.Wrap(err, "unexpected error reading /proc directory")
    }
    f, err := n.fileSystem.ReadFile(nginxPID)
    if err != nil {
        return errors.Wrapf(err, "unexpected error reading %v", nginxPID)
    }
    pid, err := strconv.Atoi(strings.TrimRight(string(f), "\r\n"))
    if err != nil {
        return errors.Wrapf(err, "unexpected error reading the nginx PID from %v", nginxPID)
    }
    _, err = fs.NewProc(pid)

    return err
}

我们看到访问Nginx的端口是n.cfg.ListenPorts.Status,其值同样来自于Nginx Ingress Controller的启动参数status-port,默认值为18080

最后通过Nginx配置文件我们可以看到Nginx在启动时同时会监听18080端口,如此我们便可通过该端口对其进行健康检查:

# used for NGINX healthcheck and access to nginx stats
server {
    listen 18080 default_server  backlog=511;
    listen [::]:18080 default_server  backlog=511;
    set $proxy_upstream_name "-";

    # 访问该路径直接返回200以说明Nginx能正常接收到请求
    location /healthz {
        access_log off;
        return 200;
    }

    # 校验当前内存中是否正常有后端服务信息
    location /is-dynamic-lb-initialized {
        access_log off;
        content_by_lua_block {
            local configuration = require("configuration")
            local backend_data = configuration.get_backends_data()
            if not backend_data then
            ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR)
            return
            end

            ngx.say("OK")
            ngx.exit(ngx.HTTP_OK)
        }
    }

    # 获取基本的监控统计信息
    location /nginx_status {
        set $proxy_upstream_name "internal";
        access_log off;
        stub_status on;
    }

    # 默认转发到404服务
    location / {
        set $proxy_upstream_name "upstream-default-backend";
        proxy_pass          http://upstream-default-backend;
    }
}

至此,Nginx Ingress Controller健康检查机制分析完毕,总结下来其主要校验两方面:

  1. Nginx进程是否正常运行
  2. 若开启dynamic-configuration其维护在内存中的后端服务信息是否存在
上一篇:2018云栖Workshop应用发布实践手册(一)


下一篇:阿里云容器服务K8S Ingress Controller日志采集实践