网络连接导致的性能问题

现象:


tcp        0      0 ::ffff:192.168.1.12:59103   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59085   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59331   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:46381   ::ffff:192.168.1.104:3306   TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59034   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59383   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59138   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59407   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59288   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:58905   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:58867   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:58891   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59334   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:46129   ::ffff:192.168.1.100:3306   TIME_WAIT   timewait (0.00/0/0)
tcp        0      0 ::ffff:192.168.1.12:59143   ::ffff:192.168.1.11:3306    TIME_WAIT   timewait (0.00/0/0)

通过检查 sysctl.conf,我们看到所有的配置均为默认,于是尝试如下修改。其实这个修改,应该说是在分析得不够精准的情况下做的判断。

因为在服务端出现大量的 timewait,说明是服务端主动断开的 TCP 连接。

而我们处理这样的连接,无非就是释放服务端的句柄和内存资源,但是不能释放端口,因为服务端只开了一个 listen 端口。

net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_keepalive_time = 3

通过上述处理后,问题依旧。

通过dmesg可以看到如下信息:


Nov  4 11:35:48 localhost kernel: __ratelimit: 108 callbacks suppressed
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:48 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:53 localhost kernel: __ratelimit: 592 callbacks suppressed
Nov  4 11:35:53 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:53 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:57 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: __ratelimit: 281 callbacks suppressed
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:35:58 localhost kernel: nf_conntrack: table full, dropping packet.
Nov  4 11:36:14 localhost kernel: __ratelimit: 7 callbacks suppressed

在 nf_conntrack 模块中,实现了对连接跟踪。它利用 netfilter 框架中的 nf_register_hook/nf_unregister_hook 函数来注册钩子项,调用 nf_conntrack_in 来建立相应连接,ipv4_conntrack_in 挂载在 NF_IP_PRE_ROUTEING 点上(该函数主要实现了创建连接),从而实现连接跟踪。

然后就开始处理nf_conntrack: table full的问题:

1、通过配置参数解决问题


net.netfilter.nf_conntrack_max
//是允许的最大跟踪连接条目,是在内核内存中netfilter可以同时处理的“任务”。
net.netfilter.nf_conntrack_tcp_timeout_established
//是TCP连接创建时的超时时间。

2、通过关闭防火墙来解决问题

 

上一篇:kube-proxy ipvs calico


下一篇:从iptables 到 nf_conntrack(1)