一次内核 crash 的排查记录
使用的发行版本是 CentOS,内核版本是 3.10.0
,在正常运行的情况下内核发生了崩溃,还好有 vmcore 生成。
准备排查环境
- crash
- 内核调试信息rpm,下载的两个 rpm 版本必须和内核版本一致
- kernel-debuginfo-common-x86_64-3.10.0-327.el7.x86_64.rpm
- kernel-debuginfo-3.10.0-327.el7.x86_64.rpm
包从这个地址中获取的,速度尚可 https://mirrors.ocf.berkeley.edu/centos-debuginfo/7/x86_64/
排查
准备好生成的 vmcore
- 进入 crash
[zzz@localhost kernel-debug]# crash ../vmcore /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux
- 可以看到直接原因是访问了空指针 (0000000000000008)
KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux DUMPFILE: ../vmcore [PARTIAL DUMP] CPUS: 8 DATE: Wed Apr 29 19:40:42 2020 UPTIME: 335 days, 01:46:01 LOAD AVERAGE: 23.98, 26.19, 15.75 TASKS: 688 NODENAME: localhost.localdomain RELEASE: 3.10.0-327.el7.x86_64 VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015 MACHINE: x86_64 (3408 Mhz) MEMORY: 15.6 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008" PID: 76 COMMAND: "kswapd0" TASK: ffff88044beba280 [THREAD_INFO: ffff88044bef0000] CPU: 2 STATE: TASK_RUNNING (PANIC)
- 观察堆栈,看代码层面大概是哪里产生的问题
异常发生在crash> bt PID: 76 TASK: ffff88044beba280 CPU: 2 COMMAND: "kswapd0" #0 [ffff88044bef3610] machine_kexec at ffffffff81051beb #1 [ffff88044bef3670] crash_kexec at ffffffff810f2542 #2 [ffff88044bef3740] oops_end at ffffffff8163e1a8 #3 [ffff88044bef3768] no_context at ffffffff8162e2b8 #4 [ffff88044bef37b8] __bad_area_nosemaphore at ffffffff8162e34e #5 [ffff88044bef3800] bad_area_nosemaphore at ffffffff8162e4b8 #6 [ffff88044bef3810] __do_page_fault at ffffffff81640fce #7 [ffff88044bef3868] do_page_fault at ffffffff81641113 #8 [ffff88044bef3890] page_fault at ffffffff8163d408 [exception RIP: down_read_trylock+9] RIP: ffffffff810aa989 RSP: ffff88044bef3940 RFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8801b4f9ff80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000008 RBP: ffff88044bef3940 R8: 0000000000000000 R9: 0000000000017bc0 R10: ffff880465fd8000 R11: 0000000000000000 R12: ffff8801b4f9ff81 R13: ffffea00047dfbc0 R14: 0000000000000008 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88044bef3948] page_lock_anon_vma_read at ffffffff811a2e65 #10 [ffff88044bef3978] page_referenced at ffffffff811a30e7 #11 [ffff88044bef39f0] shrink_page_list at ffffffff8117d264 #12 [ffff88044bef3b28] shrink_inactive_list at ffffffff8117df3a #13 [ffff88044bef3bf0] shrink_lruvec at ffffffff8117ea05 #14 [ffff88044bef3cf0] shrink_zone at ffffffff8117ee66 #15 [ffff88044bef3d48] balance_pgdat at ffffffff8118010c #16 [ffff88044bef3e20] kswapd at ffffffff811803d3 #17 [ffff88044bef3ec8] kthread at ffffffff810a5aef #18 [ffff88044bef3f50] ret_from_fork at ffffffff81645858
down_read_trylock
函数内,后面发生了page fault
,先反汇编看一下 RIP 内地址(ffffffff810aa989)的内容:
打开这个内核版本的crash> dis -l ffffffff810aa989 /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/arch/x86/include/asm/rwsem.h: 83 0xffffffff810aa989 <down_read_trylock+9>: mov (%rdi),%rax
arch/x86/include/asm/rwsem.h
,可以使用网址https://elixir.bootlin.com/linux/v3.10/source/arch/x86/include/asm/rwsem.h
打开,代码如下
指针/* * trylock for reading -- returns 1 if successful, 0 if contention */ static inline int __down_read_trylock(struct rw_semaphore *sem) { long result, tmp; asm volatile("# beginning __down_read_trylock\n\t" " mov %0,%1\n\t" "1:\n\t" " mov %1,%2\n\t" " add %3,%2\n\t" " jle 2f\n\t" LOCK_PREFIX " cmpxchg %2,%0\n\t" " jnz 1b\n\t" "2:\n\t" "# ending __down_read_trylock\n\t" : "+m" (sem->count), "=&a" (result), "=&r" (tmp) : "i" (RWSEM_ACTIVE_READ_BIAS) : "memory", "cc"); return result >= 0 ? 1 : 0; }
sem
也就是寄存器RAX
的值为0000000000000008
,地址解引用失败,访问空指针,引发异常。观察堆栈调用情况,都是内存管理相关操作,其中还有一个 kswapd 调用,到此,推测是内存爆了导致的 - 查看当时内存使用情况,观察内存使用到了 98%,而交换空间也被大量使用,符合上面函数调用的推导
crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 3962164 15.1 GB ---- FREE 46698 182.4 MB 1% of TOTAL MEM USED 3915466 14.9 GB 98% of TOTAL MEM SHARED 224712 877.8 MB 5% of TOTAL MEM BUFFERS 0 0 0% of TOTAL MEM CACHED 555017 2.1 GB 14% of TOTAL MEM SLAB 136079 531.6 MB 3% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 4194303 16 GB ---- SWAP USED 3042976 11.6 GB 72% of TOTAL SWAP SWAP FREE 1151327 4.4 GB 27% of TOTAL SWAP COMMIT LIMIT 6175385 23.6 GB ---- COMMITTED 9409769 35.9 GB 152% of TOTAL LIMIT
- 查看当时进程使用情况(节选),几乎全部是页面交换进程在运行
crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM > 0 0 0 ffffffff81951440 RU 0.0 0 0 [swapper/0] > 0 0 1 ffff88044f942e00 RU 0.0 0 0 [swapper/1] 0 0 2 ffff88044f943980 RU 0.0 0 0 [swapper/2] > 0 0 3 ffff88044f944500 RU 0.0 0 0 [swapper/3] > 0 0 4 ffff88044f945080 RU 0.0 0 0 [swapper/4] 0 0 5 ffff88044f945c00 RU 0.0 0 0 [swapper/5] > 0 0 6 ffff88044f946780 RU 0.0 0 0 [swapper/6] > 0 0 7 ffff88044f947300 RU 0.0 0 0 [swapper/7] 1 0 7 ffff88044f848000 IN 0.0 189172 3080 systemd 2 0 0 ffff88044f848b80 IN 0.0 0 0 [kthreadd] 3 2 0 ffff88044f849700 IN 0.0 0 0 [ksoftirqd/0] 5 2 0 ffff88044f84ae00 IN 0.0 0 0 [kworker/0:0H] 7 2 0 ffff88044f84c500 IN 0.0 0 0 [migration/0] 8 2 7 ffff88044f84d080 IN 0.0 0 0 [rcu_bh]
到此,问题可以锁定为是在大量内存页交换导致的问题,但是具体的代码逻辑并不能确定是为何处。而当时crash的时间也是在急需内存的操作下发生了,需要避免的地方也只有关闭一些服务释放出内存,在该操作完成后再启动服务。