昨晚上线服务的时候,看log偶然发现服务在启动半小时左右就会被supervise重新拉起,也没有core。通过重新启动的服务发现内存飙涨,且持续增加,怀疑是内存打满,进程被kill了。
其实怀疑是正确的,如何验证呢?其实通过dmesg就可以分分钟验证。
详细信息如下:
[ 0.000000] Out of memory: Kill process 8668 (dsnav) score 947 or sacrifice child
[ 0.000000] Killed process 8668, UID 501, (dsnav) total-vm:127974752kB, anon-rss:124608960kB, file-rss:8kB
[ 0.000000] argus-agent invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[ 0.000000] argus-agent cpuset=/ mems_allowed=0
[ 0.000000] Pid: 6385, comm: argus-agent Tainted: G --------------- H 2.6.32_431-3 #2
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff810c8bc1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
[ 0.000000] [<ffffffff8111a210>] ? dump_header+0x90/0x1b0
[ 0.000000] [<ffffffff8111a652>] ? oom_kill_process+0x82/0x2a0
[ 0.000000] [<ffffffff8111aaa0>] ? select_bad_process.clone.1+0xe0/0x120
[ 0.000000] [<ffffffff8111ac76>] ? out_of_memory+0xe6/0x210
[ 0.000000] [<ffffffff81126c01>] ? __alloc_pages_nodemask+0x8e1/0x900
[ 0.000000] [<ffffffff81119082>] ? filemap_fault+0x1b2/0x520
[ 0.000000] [<ffffffff81140364>] ? __do_fault+0x54/0x530
[ 0.000000] [<ffffffff81140937>] ? handle_pte_fault+0xf7/0xa40
[ 0.000000] [<ffffffff8150e5f0>] ? thread_return+0x4e/0x77e
[ 0.000000] [<ffffffff81099342>] ? enqueue_hrtimer+0x82/0xd0
[ 0.000000] [<ffffffff81099701>] ? lock_hrtimer_base+0x31/0x60
[ 0.000000] [<ffffffff8109a27f>] ? hrtimer_try_to_cancel+0x3f/0xd0
[ 0.000000] [<ffffffff81510dd6>] ? rwsem_down_read_failed+0x26/0x30
[ 0.000000] [<ffffffff811414aa>] ? handle_mm_fault+0x22a/0x300
[ 0.000000] [<ffffffff810466f8>] ? __do_page_fault+0x138/0x480
[ 0.000000] [<ffffffff811bd906>] ? ep_poll+0x306/0x330
[ 0.000000] [<ffffffff810603a0>] ? default_wake_function+0x0/0x20
[ 0.000000] [<ffffffff8151410e>] ? do_page_fault+0x3e/0xa0
[ 0.000000] [<ffffffff815114d5>] ? page_fault+0x25/0x30
[ 0.000000] Mem-Info:
[ 0.000000] DMA per-cpu:
[ 0.000000] CPU 0: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 1: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 2: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 3: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 4: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 5: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 6: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 7: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 8: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 9: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 10: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 11: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 12: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 13: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 14: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 15: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 16: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 17: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 18: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 19: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 20: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 21: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 22: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 23: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 24: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 25: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 26: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 27: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 28: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 29: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 30: hi: 0, btch: 1 usd: 0
[ 0.000000] CPU 31: hi: 0, btch: 1 usd: 0
[ 0.000000] DMA32 per-cpu:
[ 0.000000] CPU 0: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 1: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 2: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 3: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 4: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 5: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 6: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 7: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 8: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 9: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 10: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 11: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 12: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 13: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 14: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 15: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 16: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 17: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 18: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 19: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 20: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 21: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 22: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 23: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 24: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 25: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 26: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 27: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 28: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 29: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 30: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 31: hi: 186, btch: 31 usd: 0
[ 0.000000] Normal per-cpu:
[ 0.000000] CPU 0: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 1: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 2: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 3: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 4: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 5: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 6: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 7: hi: 186, btch: 31 usd: 16
[ 0.000000] CPU 8: hi: 186, btch: 31 usd: 1
[ 0.000000] CPU 9: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 10: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 11: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 12: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 13: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 14: hi: 186, btch: 31 usd: 20
[ 0.000000] CPU 15: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 16: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 17: hi: 186, btch: 31 usd: 16
[ 0.000000] CPU 18: hi: 186, btch: 31 usd: 1
[ 0.000000] CPU 19: hi: 186, btch: 31 usd: 1
[ 0.000000] CPU 20: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 21: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 22: hi: 186, btch: 31 usd: 15
[ 0.000000] CPU 23: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 24: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 25: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 26: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 27: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 28: hi: 186, btch: 31 usd: 1
[ 0.000000] CPU 29: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 30: hi: 186, btch: 31 usd: 0
[ 0.000000] CPU 31: hi: 186, btch: 31 usd: 0
[ 0.000000] active_anon:32558911 inactive_anon:44 isolated_anon:0
[ 0.000000] active_file:168 inactive_file:0 isolated_file:0
[ 0.000000] unevictable:0 dirty:37 writeback:0 unstable:0
[ 0.000000] free:131744 slab_reclaimable:7708 slab_unreclaimable:17716
[ 0.000000] mapped:249 shmem:48 pagetables:67098 bounce:0
[ 0.000000] DMA free:15888kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15260kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 0.000000] lowmem_reserve[]: 0 1856 129116 129116
[ 0.000000] DMA32 free:444884kB min:968kB low:1208kB high:1452kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1900568kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 0.000000] lowmem_reserve[]: 0 0 127260 127260
[ 0.000000] Normal free:66204kB min:66604kB low:83252kB high:99904kB active_anon:130235644kB inactive_anon:176kB active_file:672kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130314240kB mlocked:0kB dirty:148kB writeback:0kB mapped:996kB shmem:192kB slab_reclaimable:30832kB slab_unreclaimable:70864kB kernel_stack:10960kB pagetables:268392kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1681 all_unreclaimable? yes
[ 0.000000] lowmem_reserve[]: 0 0 0 0
[ 0.000000] DMA: 0*4kB 2*8kB 0*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15888kB
[ 0.000000] DMA32: 7*4kB 7*8kB 6*16kB 7*32kB 5*64kB 4*128kB 5*256kB 10*512kB 9*1024kB 5*2048kB 102*4096kB = 444884kB
[ 0.000000] Normal: 2444*4kB 832*8kB 612*16kB 367*32kB 175*64kB 117*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 67984kB
[ 0.000000] 0 total pagecache pages
[ 0.000000] 0 pages in swap cache
[ 0.000000] Swap cache stats: add 0, delete 0, find 0/0
[ 0.000000] Free swap = 0kB
[ 0.000000] Total swap = 0kB
[ 0.000000] 33554416 pages RAM
[ 0.000000] 594768 pages reserved
[ 0.000000] 2419 pages shared
[ 0.000000] 32818620 pages non-shared