"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

现象

主机无缘无故死机,主机上服务无响应

日志出现大量:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

日志输出信息:

#tail -f /var/log/messages

kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

kernel: INFO: task keepalived:21553 blocked for more than 120 seconds.

kernel: INFO: task vnetd:18082 blocked for more than 120 seconds.

kernel: INFO: task zabbix_agentd:15274 blocked for more than 120 seconds.

kernel: INFO: task jbd2/dm-3-8:848 blocked for more than 120 seconds.

kernel: INFO: task pickup:21858 blocked for more than 120 seconds.

kernel: INFO: task xfsaild/dm-0:476 blocked for more than 120 seconds.

Runtime journal is using 832.0M (max allowed 794.3M, trying to leave 1.1G free of 6.9G available → current limit 832.0M)

# dmesg |grep '/proc/sys/kernel/hung_task_timeout_secs' -B 1

[51140129.902940] INFO: task systemd:1 blocked for more than 120 seconds.
[51140129.902992] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.903265] INFO: task xfsaild/dm-0:476 blocked for more than 120 seconds.
[51140129.903298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.903636] INFO: task jbd2/dm-3-8:848 blocked for more than 120 seconds.
[51140129.903668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.903796] INFO: task keepalived:21553 blocked for more than 120 seconds.
[51140129.903829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.904034] INFO: task vnetd:18082 blocked for more than 120 seconds.
[51140269.655352] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655521] INFO: task zabbix_agentd:15274 blocked for more than 120 seconds.
[51140269.655546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655638] INFO: task zabbix_agentd:15275 blocked for more than 120 seconds.
[51140269.655661] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655745] INFO: task zabbix_agentd:15276 blocked for more than 120 seconds.
[51140269.655767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655852] INFO: task kworker/4:0:29226 blocked for more than 120 seconds.
[51140269.655874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.656181] INFO: task pickup:21858 blocked for more than 120 seconds.
[51140269.656204] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

分析:

echo 0 > /proc/sys/kernel/hung_task_timeout_secs,提示内容为禁用超时限制,就不会再有上面信息提示,不建议禁用

内核参数设置为: kernel.hung_task_timeout_secs = 120 ,表示内存写到磁盘的时间限制为120s

结合任务IO堵塞信息,可判断是内存写入磁盘造成IO堵塞堆积,导致系统失去响应。先达到vm.dirty_background_ratio的条件然后触发flush进程进行异步的回写操作,但是这一过程中应用进程仍然可以进行写操作,如果多个应用进程写入的量大于flush进程刷出的量那自然会达到vm.dirty_ratio这个参数所设定的坎,此时操作系统会转入同步地处理脏页的过程,阻塞应用进程。

 

 

问题原因:

By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.
For flushing out this data to disk this there is a time limit of 120 seconds by default.
In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.
This especially happens on systems with a lot of memory.
The problem is solved in later kernels

默认情况下, Linux会最多使用40%的可用内存作为文件系统缓存。当超过这个阈值后,文件系统会把将缓存中的内存全部写入磁盘, 导致后续的IO请求都是同步的。

将缓存写入磁盘时,有一个默认120秒的超时时间。 出现上面的问题的原因是IO子系统的处理速度不够快,不能在120秒将缓存中的数据全部写入磁盘。

IO系统响应缓慢,导致越来越多的请求堆积,最终系统内存全部被占用,导致系统失去响应。

这个Linux延迟写机制带来的问题,并且在主机内存越大时,出现该问题的可能性更大。

 

解决方法:

根据情况,对vm.dirty_ratio,vm.dirty_background_ratio两个参数进行调优设置。 

优化思路:

  1. 减少脏数据的比例,避免刷写超时
  2. 减小脏数据在内存中的存放时间,避免积少成多
临时生效
sysctl -w vm.dirty_ratio = 40 sysctl -w vm.dirty_background_ratio = 10
持久写入内核参数 #vi /etc/sysctl.conf vm.dirty_ratio = 40 vm.dirty_background_ratio = 10 #sysctl -p

vm.dirty_background_ratio是内存可以填充“脏数据”的百分比。这些“脏数据”在稍后是会写入磁盘的,pdflush/flush/kdmflush这些后台进程会稍后清理脏数据。举一个例子,我有32G内存,那么有3.2G的内存可以待着内存里,超过3.2G的话就会有后来进程来清理它。

vm.dirty_ratio 是绝对的脏数据限制,内存里的脏数据百分比不能超过这个值,如果超过,将强制刷写到磁盘。如果脏数据超过这个数量,新的IO请求将会被阻挡,直到脏数据被写进磁盘。这是造成IO卡顿的重要原因,但这也是保证内存中不会存在过量脏数据的保护机制。

vm.dirty_expire_centisecs 指定脏数据能存活的时间。在这里它的值是30秒。当 pdflush/flush/kdmflush 进行起来时,它会检查是否有数据超过这个时限,如果有则会把它异步地写到磁盘中。毕竟数据在内存里待太久也会有丢失风险。

vm.dirty_writeback_centisecs 指定多长时间
pdflush/flush/kdmflush 这些进程会起来一次。

脏数据

脏数据 :由于Linux内核实现的一种主要磁盘缓存的存在,也就是页高速缓存(cache)。页高速缓存的缓存作用,写操作实际上会被延迟。当页高速缓存中的数据比后台存储的数据更新时,那么该数据就被称做脏数据。

参考链接:

https://blog.csdn.net/weixin_43279032/article/details/87718804

http://ilinuxkernel.com/?p=1578

页高速缓存和脏数据等其他IO术语参考:https://blog.51cto.com/qixue/1906775

 

上一篇:MIT6.S081 ---- Preparation: Read chapter 4


下一篇:自我介绍和Markdown学习