【背景描述】:通过调用kongAPI编写脚本,实现对kong的target权重的设置,即流量的摘除与添加操作。
【问题描述】:通过Jenkins构建完成后,出现两种异常情况:
1、kong的target少了一个节点,比如本应该是3个节点,最后只添加了2个节点
2、kong的target权重非100,即有时候权限值居然是1
日志中出现异常报错:
127.0.0.1 - - [26/Jun/2019:04:04:24 +0000] "POST /config?check_hash=1 HTTP/1.1" 201 7242 "-" "Go-http-client/1.1"
2019/06/26 04:04:25 [error] 33#0: 68787 [lua] events.lua:254: post(): worker-events: failed posting event "targets" by "balancer"; no memory, context: ngx.timer
2019/06/26 04:04:25 [error] 33#0: 68787 [lua] handler.lua:265: failed broadcasting target create to workers: failed posting event "targets" by "balancer"; no memory, context: ngx.timer
2019/06/26 04:04:25 [error] 39#0: 68788 [lua] events.lua:254: post(): worker-events: failed posting event "targets" by "balancer"; no memory, context: ngx.timer
2019/06/26 04:04:25 [error] 39#0: 68788 [lua] handler.lua:265: failed broadcasting target create to workers: failed posting event "targets" by "balancer"; no memory, context: ngx.timer
2019/06/26 04:04:25 [error] 37#0: 68758 [lua] balancer.lua:581: on_target_event(): target create: balancer not found for nginx-ingress-controller-live.events.svc, context: ngx.timer
2019/06/26 04:04:25 [error] 33#0: 68787 [lua] balancer.lua:581: on_target_event(): target create: balancer not found for nginx-ingress-controller-live.events.svc, context: ngx.timer```
The error logs suggest that the kong_worker_events sh
【排查分析】:
通过排查,发现是Kong的一个bug:
The error logs suggest that the kong_worker_events shm is full. Further looking at the shm metrics from Kong's /status endpoint gives the following information for the shm used for this library:
"allocated_slabs": "5.00 MiB",
"capacity": "5.00 MiB"
In conversation with @thibaultcha and @Tieske, it was noted that this is due to fragmentation issue inside the shm due to insertion of tables of varied sizes into the shm. The shm actually might have space for insertion but requires multiple evictions to find a single slab to allocate the table being inserted. This possibly requires using shm_set_tries in lua-resty-mlcache.
,具体参考链接:https://www.gitmemory.com/hbagdi
【解决方案】:
Spin up Kong (>=1.1) in dbless mode
Use /config endpoint to change Kong's configuration. Make sure that plugins, upstreams and targets are also changed in each reload.
This should result in shm getting fragmented over time and expose this problem