kubernetes k8s nginx ingress 多层代理后的诡异问题

算法服务布署在k8s上

服务走了两层代理,出问题其实并不觉得意外,因为代理层数过多

kong -> nginx-ingress -> svc

通过kong访问,部分请求返回502

<html><head><title>502 Bad Gateway</title></head><body><center><h1>502 Bad Gateway</h1></center><hr><center>nginx/1.19.1</center></body></html>

查看kong的访问,发现warn

2021/01/25 16:46:05 [warn] 166110#0: *3522559308 a client request body is buffered to a temporary file /usr/local/kong/client_body_temp/0000015840, client: 192.168.11.111, server: kong, request: "POST /slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment HTTP/1.1", host: "cclient.github.com"
2021/01/25 16:46:06 [warn] 166118#0: *3522560171 a client request body is buffered to a temporary file /usr/local/kong/client_body_temp/0000015841, client: 192.168.11.111, server: kong, request: "POST /slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment HTTP/1.1", host: "cclient.github.com"
2021/01/25 16:46:06 [warn] 166117#0: *3522561394 a client request body is buffered to a temporary file /usr/local/kong/client_body_temp/0000015842, client: 192.168.11.111, server: kong, request: "POST /slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment HTTP/1.1", host: "cclient.github.com"

调大client_body_buffer_size后报警不再出现,但依然有502请求

经测试-直接通过nodeport访问,无502

通过nginx-ingress,部分502

能过kong,部分502

先一级一级优化排查吧

原始的ingress信息

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  generation: 1
  name: statefulset-123456
  namespace: default
spec:
  rules:
  - host: cclient.github.com
    http:
      paths:
      - backend:
          serviceName: statefulset-123456
          servicePort: 80
        path: /api_v1/123456(/|$)(.*)

服务本身,nodeport访问,返回header

curl -D header_nodeport -H "Content-Type: application/json" -X POST -d '["我感觉就是翻新机。","苏宁的电脑就是比便宜好几百"]' "http://192.168.100.128:32359/sentiment"

HTTP/1.1 200 OK
Content-Length: 519
Content-Type: application/json
Connection: keep-alive
Keep-Alive: 5

服务通过nginx-ingress访问,返回header

curl -D header_nodeport -H "Content-Type: application/json" -X POST -d '["我感觉就是翻新机。","苏宁的电脑就是比便宜好几百"]' "http://matrix-paas.mlamp.cn/api_v1/123456/sentiment

curl -D header_nodeport -H "Content-Type: application/json" -X POST -d '["我感觉就是翻新机。","苏宁的电脑就是比便宜好几百"]' "http://matrix-paas.mlamp.cn/slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment"

HTTP/1.1 200 OK
Server: nginx/1.19.1
Date: Wed, 27 Jan 2021 10:36:37 GMT
Content-Type: application/json
Content-Length: 519
Connection: keep-alive
Vary: Accept-Encoding

服务通过外层的kong访问返回header

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 519
Connection: keep-alive
Server: nginx/1.19.1
Date: Wed, 27 Jan 2021 10:36:01 GMT
Vary: Accept-Encoding
X-Kong-Upstream-Latency: 92
X-Kong-Proxy-Latency: 16
Via: kong/2.0.4

先保证nginx-ingress不返回502再说,加了很多和nginx对应的time-out,buffer参数,但都不生效,依然大量502

nginx.ingress.kubernetes.io/client-body-buffer-size: 100m
nginx.ingress.kubernetes.io/proxy-body-size: 100m
nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
nginx.ingress.kubernetes.io/proxy-next-upstream-timeout: "600"
nginx.ingress.kubernetes.io/proxy-next-upstream-tries: "8"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"

以前写过一篇也是http1.0,http1.1的问题记录文章,按个人经验试了下http版本

nginx 配合jersey+netty的奇怪问题 - 资本主义接班人 - 博客园 (cnblogs.com)

尝试性的加了一条

nginx.ingress.kubernetes.io/proxy-http-version: "1.0"

502便不再出现了,问题解决

只是还有稍许疑问,因为后端服务又确实是http1.1,可能是nginx-ingress的bug,暂时保证可以正常访问不深究(老集群,nginx-ingress现在也早已不推荐使用)

外层的kong因为集成度过高,无法为特定请求配置1.0,全局1.0的话,又会影响其他服务,暂时放下

上一篇:详细教程丨如何利用Rancher和Kong实现服务网格?


下一篇:A. Most Unstable Array