记一个程序oom的排查过程

一,背景

收到应用服务报警,然后登录上服务器查看原因,发现进程不再了。


二,问题分析

1,那么判断进程被干掉的原因如下:

(1),机器重启了

通过uptime看机器并未重启

(2),程序有bug自动退出了

通过查询程序的error log,并未发现异常

(3),被别人干掉了

由于程序比较消耗内存,故猜想是不是oom了,被系统给干掉了。所以查messages日志,发现的确是oom了:

Jul 27 13:29:54 kernel: Out of memory: Kill process 17982 (java) score 77 or sacrifice child


2,通过oom详细信息输出分析被干掉的具体原因

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
[511250.458988] mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[511250.458993] mysqld cpuset=/ mems_allowed=0
[511250.458996] CPU: 7 PID: 30063 Comm: mysqld Not tainted 3.10.0-514.21.2.el7.x86_64 #1
[511250.458997] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[511250.458999]  ffff88056236bec0 0000000040f4df68 ffff88044b76b910 ffffffff81687073
[511250.459002]  ffff88044b76b9a0 ffffffff8168201e ffffffff810eb0dc ffff88081ae80c20
[511250.459004]  ffff88081ae80c38 ffff88044b76b9f8 ffff88056236bec0 0000000000000000
[511250.459007] Call Trace:
[511250.459015]  [<ffffffff81687073>] dump_stack+0x19/0x1b
[511250.459020]  [<ffffffff8168201e>] dump_header+0x8e/0x225
[511250.459026]  [<ffffffff810eb0dc>] ? ktime_get_ts64+0x4c/0xf0
[511250.459033]  [<ffffffff81184cfe>] oom_kill_process+0x24e/0x3c0
[511250.459035]  [<ffffffff8118479d>] ? oom_unkillable_task+0xcd/0x120
[511250.459038]  [<ffffffff81184846>] ? find_lock_task_mm+0x56/0xc0
[511250.459042]  [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30
[511250.459045]  [<ffffffff81185536>] out_of_memory+0x4b6/0x4f0
[511250.459047]  [<ffffffff81682b27>] __alloc_pages_slowpath+0x5d7/0x725
[511250.459051]  [<ffffffff8118b645>] __alloc_pages_nodemask+0x405/0x420
[511250.459055]  [<ffffffff811cf94a>] alloc_pages_current+0xaa/0x170
[511250.459058]  [<ffffffff81180bd7>] __page_cache_alloc+0x97/0xb0
[511250.459060]  [<ffffffff81183750>] filemap_fault+0x170/0x410
[511250.459078]  [<ffffffffa01b5016>] ext4_filemap_fault+0x36/0x50 [ext4]
[511250.459082]  [<ffffffff811ac84c>] __do_fault+0x4c/0xc0
[511250.459084]  [<ffffffff811acce3>] do_read_fault.isra.42+0x43/0x130
[511250.459087]  [<ffffffff811b1471>] handle_mm_fault+0x6b1/0x1040
[511250.459091]  [<ffffffff810f55c0>] ? futex_wake+0x80/0x160
[511250.459096]  [<ffffffff81692c04>] __do_page_fault+0x154/0x450
[511250.459098]  [<ffffffff81692fe6>] trace_do_page_fault+0x56/0x150
[511250.459101]  [<ffffffff8169268b>] do_async_page_fault+0x1b/0xd0
[511250.459103]  [<ffffffff8168f178>] async_page_fault+0x28/0x30
[511250.459104] Mem-Info:
[511250.459109] active_anon:7922627 inactive_anon:1653 isolated_anon:0
 active_file:1675 inactive_file:2820 isolated_file:0
 unevictable:0 dirty:11 writeback:2 unstable:0
 slab_reclaimable:61817 slab_unreclaimable:25990
 mapped:3607 shmem:4602 pagetables:42625 bounce:0
 free:50021 free_pcp:149 free_cma:0
[511250.459112] Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[511250.459117] lowmem_reserve[]: 0 2814 31994 31994
[511250.459120] Node 0 DMA32 free:119704kB min:5940kB low:7424kB high:8908kB active_anon:2678512kB inactive_anon:276kB active_file:124kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883436kB mlocked:0kB dirty:0kB writeback:0kB mapped:1100kB shmem:1632kB slab_reclaimable:48796kB slab_unreclaimable:9340kB kernel_stack:5248kB pagetables:11424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32902 all_unreclaimable? yes
[511250.459124] lowmem_reserve[]: 0 0 29180 29180
[511250.459127] Node 0 Normal free:63896kB min:61608kB low:77008kB high:92412kB active_anon:29011996kB inactive_anon:6336kB active_file:6576kB inactive_file:11148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29881068kB mlocked:0kB dirty:44kB writeback:8kB mapped:13328kB shmem:16776kB slab_reclaimable:198472kB slab_unreclaimable:94604kB kernel_stack:53472kB pagetables:159076kB unstable:0kB bounce:0kB free_pcp:656kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? no
[511250.459131] lowmem_reserve[]: 0 0 0 0
[511250.459134] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[511250.459144] Node 0 DMA32: 9372*4kB (UEM) 2427*8kB (UEM) 1179*16kB (UEM) 369*32kB (UEM) 104*64kB (EM) 31*128kB (EM) 14*256kB (UEM) 9*512kB (UEM) 7*1024kB (UEM) 3*2048kB (M) 0*4096kB = 119704kB
[511250.459154] Node 0 Normal: 1540*4kB (UE) 6148*8kB (UE) 503*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63392kB
[511250.459162] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[511250.459163] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[511250.459164] 9275 total pagecache pages
[511250.459166] 0 pages in swap cache
[511250.459167] Swap cache stats: add 0, delete 0, find 0/0
[511250.459168] Free swap  = 0kB
[511250.459168] Total swap = 0kB
[511250.459169] 8388478 pages RAM
[511250.459170] 0 pages HighMem/MovableOnly
[511250.459171] 193375 pages reserved
[511250.459172] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[511250.459178] [  444]     0   444    30482      118      63        0             0 systemd-journal
[511250.459180] [  476]     0   476    14365      114      28        0         -1000 auditd
[511250.459182] [  508]     0   508     5315       75      14        0             0 irqbalance
[511250.459184] [  509]   998   509   132421     1908      50        0             0 polkitd
[511250.459186] [  510]     0   510     6686      196      17        0             0 systemd-logind
[511250.459188] [  514]    81   514     6672      148      16        0          -900 dbus-daemon
[511250.459189] [  592]     0   592     6972       52      18        0             0 atd
[511250.459191] [  595]     0   595    31969      188      17        0             0 crond
[511250.459193] [  607]     0   607    28020       44      11        0             0 agetty
[511250.459195] [ 1036]     0  1036   138798     3179      89        0             0 tuned
[511250.459197] [ 1037]     0  1037   174118      357     182        0             0 rsyslogd
[511250.459198] [ 1089]    38  1089     7865      174      19        0             0 ntpd
[511250.459200] [ 4714]     0  4714    26866      243      54        0         -1000 sshd
[511250.459202] [ 6624]     0  6624      920      100       4        0             0 aliyun-service
[511250.459204] [19284]     0 19284     8386      171      21        0             0 AliYunDunUpdate
[511250.459206] [19335]     0 19335    34887     1367      64        0             0 AliYunDun
[511250.459208] [21657]    26 21657    59097     1539      52        0         -1000 postgres
[511250.459210] [21658]    26 21658    48503      264      43        0             0 postgres
[511250.459212] [21660]    26 21660    59124     2338      52        0             0 postgres
[511250.459213] [21661]    26 21661    59097      332      48        0             0 postgres
[511250.459215] [21662]    26 21662    59097      537      47        0             0 postgres
[511250.459217] [21663]    26 21663    59328      513      50        0             0 postgres
[511250.459218] [21664]    26 21664    49067      317      44        0             0 postgres
[511250.459220] [ 7276]     0  7276    32471      164      16        0             0 screen
[511250.459222] [ 7277]     0  7277    29357      123      13        0             0 bash
[511250.459223] [ 7388]     0  7388     4303     1880      12        0             0 sagent
[511250.459225] [ 7747]     0  7747    32504      200      16        0             0 screen
[511250.459226] [ 7748]     0  7748    29357      122      14        0             0 bash
[511250.459228] [ 7781]     0  7781     8051     4108      20        0             0 tagent
[511250.459230] [ 9897]     0  9897  3062553   270245     774        0             0 java
[511250.459231] [ 9937]    26  9937    59406      657      53        0             0 postgres
[511250.459233] [ 9940]    26  9940    60212     2570      57        0             0 postgres
[511250.459235] [ 9997]    26  9997    60098     2346      56        0             0 postgres
[511250.459236] [10076]    26 10076    59574      964      54        0             0 postgres
[511250.459238] [10077]    26 10077    59618     1006      54        0             0 postgres
[511250.459239] [10078]    26 10078    59617     1005      54        0             0 postgres
[511250.459241] [11611]     0 11611    60826     4190      73        0             0 python
[511250.459243] [11619]     0 11619   348938     6222     118        0             0 python
[511250.459245] [12396]    26 12396    60086     2078      56        0             0 postgres
[511250.459246] [12499]  1001 12499  1448783    99046     328        0             0 java
[511250.459248] [12600]  1003 12600  2226317   312995     847        0             0 java
[511250.459249] [29241]     0 29241    78180     1320     101        0             0 php-fpm
[511250.459251] [29242]  1004 29242   135239     2687     108        0             0 php-fpm
[511250.459253] [29243]  1004 29243   134924     2408     108        0             0 php-fpm
[511250.459255] [29244]  1004 29244   135371     2707     108        0             0 php-fpm
[511250.459256] [29245]  1004 29245   143755    11294     125        0             0 php-fpm
[511250.459258] [29246]  1004 29246   135367     2706     108        0             0 php-fpm
[511250.459260] [29826]    27 29826    28792       86      13        0             0 mysqld_safe
[511250.459261] [30051]    27 30051   322930    39761     133        0             0 mysqld
[511250.459263] [30234]     0 30234    11365      125      22        0         -1000 systemd-udevd
[511250.459264] [11182]     0 11182    82780     5702     114        0             0 salt-minion
[511250.459266] [11193]     0 11193   171406     8289     144        0             0 salt-minion
[511250.459268] [11195]     0 11195   101432     5712     110        0             0 salt-minion
[511250.459269] [29678]  1004 29678   140301     7833     118        0             0 php-fpm
[511250.459271] [29998]  1004 29998   134983     2404     108        0             0 php-fpm
[511250.459273] [11833]     0 11833    69721     2098      58        0             0 python2.7
[511250.459275] [32113]    26 32113    60131     2012      56        0             0 postgres
[511250.459276] [ 1017]  1004  1017   135410     2748     108        0             0 php-fpm
[511250.459278] [11915]  1004 11915   144263    11778     126        0             0 php-fpm
[511250.459280] [ 5999]     0  5999     8115     3139      20        0             0 tagent
[511250.459281] [21572]  1004 21572   134919     2379     108        0             0 php-fpm
[511250.459283] [21752]  1004 21752   143751    11276     125        0             0 php-fpm
[511250.459285] [ 2977]  1004  2977   134920     2406     107        0             0 php-fpm
[511250.459286] [ 9217]     0  9217   330989   183882     550        0             0 python2.7
[511250.459288] [ 2008]  1004  2008   135816     3328     109        0             0 php-fpm
[511250.459290] [25089]  1000 25089  2800777   187701     710        0             0 java
[511250.459291] [25405]  1000 25405  1335611   105668     366        0             0 java
[511250.459293] [26033]  1000 26033  1680746    96082     367        0             0 java
[511250.459295] [26112]  1000 26112  1148121    61227     230        0             0 java
[511250.459296] [14446]     0 14446    31082      540      56        0             0 nginx
[511250.459298] [14447]  1004 14447    31278      739      58        0             0 nginx
[511250.459299] [14448]  1004 14448    31278      725      58        0             0 nginx
[511250.459301] [14449]  1004 14449    31278      714      58        0             0 nginx
[511250.459303] [14450]  1004 14450    31278      715      58        0             0 nginx
[511250.459304] [14451]  1004 14451    31245      705      58        0             0 nginx
[511250.459306] [14452]  1004 14452    31245      696      58        0             0 nginx
[511250.459307] [14453]  1004 14453    31278      712      58        0             0 nginx
[511250.459309] [14454]  1004 14454    31245      728      58        0             0 nginx
[511250.459310] [14455]  1004 14455    31278      730      58        0             0 nginx
[511250.459312] [14456]  1004 14456    31278      718      58        0             0 nginx
[511250.459314] [14457]  1004 14457    31245      707      58        0             0 nginx
[511250.459315] [14458]  1004 14458    31278      722      58        0             0 nginx
[511250.459317] [14459]  1004 14459    31278      717      58        0             0 nginx
[511250.459318] [14460]  1004 14460    31245      688      58        0             0 nginx
[511250.459320] [14462]  1004 14462    31278      712      58        0             0 nginx
[511250.459321] [14463]  1004 14463    31278      736      58        0             0 nginx
[511250.459323] [14571]     0 14571  3222105   119555     906        0             0 python
[511250.459325] [13969]     0 13969   134928     8719     143        0             0 salt-master
[511250.459326] [13982]     0 13982    78554     5647     100        0             0 salt-master
[511250.459328] [13985]     0 13985   116150     8034     134        0             0 salt-master
[511250.459330] [13989]     0 13989   151040    38826     238        0             0 salt-master
[511250.459331] [13990]     0 13990   103527    12904     148        0             0 salt-master
[511250.459333] [14067]     0 14067   280592     9651     151        0             0 salt-master
[511250.459334] [14072]     0 14072   135099     9889     141        0             0 salt-master
[511250.459336] [14220]     0 14220   134928     8828     135        0             0 salt-master
[511250.459338] [14221]     0 14221  1941362     9675     332        0             0 salt-master
[511250.459339] [14228]     0 14228   175360     9657     148        0             0 salt-master
[511250.459341] [14268]     0 14268   175362     9655     148        0             0 salt-master
[511250.459343] [14314]     0 14314   175361     9662     148        0             0 salt-master
[511250.459344] [14327]     0 14327   175363     9663     148        0             0 salt-master
[511250.459346] [14329]     0 14329   175363     9666     148        0             0 salt-master
[511250.459347] [14330]     0 14330   175364     9666     148        0             0 salt-master
[511250.459349] [14331]     0 14331   175365     9666     148        0             0 salt-master
[511250.459350] [14334]     0 14334   175366     9670     148        0             0 salt-master
[511250.459352] [14338]     0 14338   175366     9669     148        0             0 salt-master
[511250.459354] [14340]     0 14340   175366     9674     148        0             0 salt-master
[511250.459355] [14345]     0 14345   175367     9679     148        0             0 salt-master
[511250.459357] [14349]     0 14349   175367     9675     148        0             0 salt-master
[511250.459358] [14350]     0 14350   175367     9671     148        0             0 salt-master
[511250.459360] [14354]     0 14354   175368     9672     148        0             0 salt-master
[511250.459362] [14357]     0 14357   175369     9678     148        0             0 salt-master
[511250.459363] [14358]     0 14358   175369     9673     148        0             0 salt-master
[511250.459365] [14362]     0 14362   175369     9677     148        0             0 salt-master
[511250.459366] [14364]     0 14364   175370     9680     148        0             0 salt-master
[511250.459368] [14365]     0 14365   175371     9681     148        0             0 salt-master
[511250.459369] [14368]     0 14368   175371     9676     148        0             0 salt-master
[511250.459371] [14370]     0 14370   175371     9674     148        0             0 salt-master
[511250.459372] [14372]     0 14372   175372     9682     148        0             0 salt-master
[511250.459374] [14376]     0 14376   175373     9682     148        0             0 salt-master
[511250.459375] [14377]     0 14377   175374     9676     148        0             0 salt-master
[511250.459377] [14378]     0 14378   175374     9689     148        0             0 salt-master
[511250.459379] [14380]     0 14380   175650     9716     149        0             0 salt-master
[511250.459381] [14384]     0 14384   175375     9690     148        0             0 salt-master
[511250.459382] [14385]     0 14385   175375     9685     148        0             0 salt-master
[511250.459384] [14401]     0 14401   175376     9687     148        0             0 salt-master
[511250.459385] [14404]     0 14404   175377     9685     148        0             0 salt-master
[511250.459387] [14413]     0 14413   175377     9685     148        0             0 salt-master
[511250.459388] [14420]     0 14420   175377     9687     148        0             0 salt-master
[511250.459390] [14421]     0 14421   175378     9686     148        0             0 salt-master
[511250.459392] [14424]     0 14424   175380     9693     148        0             0 salt-master
[511250.459393] [14428]     0 14428   175380     9689     148        0             0 salt-master
[511250.459395] [14435]     0 14435   175382     9698     148        0             0 salt-master
[511250.459396] [14437]     0 14437   175382     9694     148        0             0 salt-master
[511250.459398] [14439]     0 14439   175383     9692     148        0             0 salt-master
[511250.459399] [14442]     0 14442   175384     9694     148        0             0 salt-master
[511250.459401] [14445]     0 14445   175385     9692     148        0             0 salt-master
[511250.459403] [14465]     0 14465   175385     9695     148        0             0 salt-master
[511250.459404] [14473]     0 14473   175385     9695     148        0             0 salt-master
[511250.459406] [14486]     0 14486   175386     9697     148        0             0 salt-master
[511250.459407] [14489]     0 14489   175386     9699     148        0             0 salt-master
[511250.459409] [14503]     0 14503   175386     9699     148        0             0 salt-master
[511250.459410] [14513]     0 14513   175387     9700     148        0             0 salt-master
[511250.459412] [14520]     0 14520   175388     9704     148        0             0 salt-master
[511250.459414] [14523]     0 14523   175389     9700     148        0             0 salt-master
[511250.459415] [14525]     0 14525   175389     9703     148        0             0 salt-master
[511250.459417] [14527]     0 14527   175390     9710     148        0             0 salt-master
[511250.459419] [14533]     0 14533   175390     9705     148        0             0 salt-master
[511250.459420] [14539]     0 14539   175390     9709     148        0             0 salt-master
[511250.459422] [14590]     0 14590   175391     9713     148        0             0 salt-master
[511250.459423] [14598]     0 14598   175390     9705     148        0             0 salt-master
[511250.459425] [14613]     0 14613   175391     9705     148        0             0 salt-master
[511250.459426] [14624]     0 14624   175392     9713     148        0             0 salt-master
[511250.459428] [14630]     0 14630   175392     9707     148        0             0 salt-master
[511250.459429] [14634]     0 14634   175393     9707     148        0             0 salt-master
[511250.459431] [14652]     0 14652   175393     9709     148        0             0 salt-master
[511250.459433] [14677]     0 14677   175394     9708     148        0             0 salt-master
[511250.459434] [14679]     0 14679   175394     9711     148        0             0 salt-master
[511250.459436] [14709]     0 14709   175395     9713     148        0             0 salt-master
[511250.459438] [14718]     0 14718   175396     9710     148        0             0 salt-master
[511250.459439] [14723]     0 14723   175396     9710     148        0             0 salt-master
[511250.459441] [14746]     0 14746   175396     9716     148        0             0 salt-master
[511250.459443] [14752]     0 14752   175461     9717     148        0             0 salt-master
[511250.459444] [14791]     0 14791   175398     9715     148        0             0 salt-master
[511250.459446] [14799]     0 14799   175397     9720     148        0             0 salt-master
[511250.459447] [14804]     0 14804   175472     9721     148        0             0 salt-master
[511250.459449] [14835]     0 14835   175462     9729     148        0             0 salt-master
[511250.459450] [14840]     0 14840   175463     9735     148        0             0 salt-master
[511250.459452] [14864]     0 14864   175463     9727     148        0             0 salt-master
[511250.459453] [14882]     0 14882   175464     9731     148        0             0 salt-master
[511250.459455] [14893]     0 14893   175465     9731     148        0             0 salt-master
[511250.459456] [14899]     0 14899   175465     9720     148        0             0 salt-master
[511250.459458] [14906]     0 14906   175466     9721     148        0             0 salt-master
[511250.459460] [14910]     0 14910   175402     9723     148        0             0 salt-master
[511250.459461] [14984]     0 14984   175466     9725     148        0             0 salt-master
[511250.459463] [14988]     0 14988   175467     9735     148        0             0 salt-master
[511250.459464] [14992]     0 14992   175468     9734     148        0             0 salt-master
[511250.459466] [15072]     0 15072   175468     9735     148        0             0 salt-master
[511250.459467] [15101]     0 15101   175468     9731     148        0             0 salt-master
[511250.459469] [15129]     0 15129   175469     9733     148        0             0 salt-master
[511250.459470] [15143]     0 15143   175469     9737     148        0             0 salt-master
[511250.459472] [15168]     0 15168   175470     9740     148        0             0 salt-master
[511250.459474] [15181]     0 15181   175474     9744     148        0             0 salt-master
[511250.459475] [15219]     0 15219   175474     9734     148        0             0 salt-master
[511250.459477] [15223]     0 15223   175477     9753     148        0             0 salt-master
[511250.459479] [15259]     0 15259   175475     9734     148        0             0 salt-master
[511250.459481] [15266]     0 15266   175476     9735     148        0             0 salt-master
[511250.459482] [15322]     0 15322   175476     9736     148        0             0 salt-master
[511250.459493] [15350]     0 15350   175476     9745     148        0             0 salt-master
[511250.459495] [15366]     0 15366   175477     9743     148        0             0 salt-master
[511250.459497] [15380]     0 15380   175506     9745     148        0             0 salt-master
[511250.459498] [15399]     0 15399   175754     9769     149        0             0 salt-master
[511250.459500] [15407]     0 15407   175479     9747     148        0             0 salt-master
[511250.459501] [15447]     0 15447   175479     9742     148        0             0 salt-master
[511250.459503] [15450]     0 15450   175479     9751     148        0             0 salt-master
[511250.459504] [15454]     0 15454   175481     9747     148        0             0 salt-master
[511250.459506] [15462]     0 15462   175480     9748     148        0             0 salt-master
[511250.459508] [23316]  1000 23316  3085650    27853     144        0             0 java
[511250.459509] [23319]  1000 23319  3085650    27289     144        0             0 java
[511250.459511] [23348]  1000 23348  3085650    27778     142        0             0 java
[511250.459512] [23351]  1000 23351  3085650    26840     141        0             0 java
[511250.459514] [23373]  1000 23373  3085650    27380     143        0             0 java
[511250.459515] [23406]  1000 23406  3085650    26933     143        0             0 java
[511250.459517] [23425]  1000 23425  3085650    27371     142        0             0 java
[511250.459518] [23445]  1000 23445  3085650    27861     141        0             0 java
[511250.459520] [23476]  1000 23476  3085650    27716     143        0             0 java
[511250.459522] [23497]  1000 23497  3085650    27902     144        0             0 java
[511250.459523] [23690]  1000 23690  2049475   328916     865        0             0 java
[511250.459525] [23691]  1000 23691  2082756   356868     894        0             0 java
[511250.459527] [23693]  1000 23693  2027460   612751    1357        0             0 java
[511250.459528] [23712]  1000 23712  2027460   610571    1348        0             0 java
[511250.459529] [23754]  1000 23754  2049474   337457     886        0             0 java
[511250.459531] [23785]  1000 23785  2049474   330831     864        0             0 java
[511250.459533] [23805]  1000 23805  2027460   615907    1366        0             0 java
[511250.459534] [23828]  1000 23828  2027460   610191    1346        0             0 java
[511250.459536] [23855]  1000 23855  2629446   589971    1351        0             0 java
[511250.459537] [23860]  1000 23860  2328022   144465     519        0             0 java
[511250.459539] [13536]  1004 13536   134981     2523     108        0             0 php-fpm
[511250.459540] [ 1813]     0  1813  1481817    46140     246        0             0 java
[511250.459542] [ 3187]     0  3187  1481817    53461     253        0             0 java
[511250.459544] [ 2993]    26  2993    59779     1712      55        0             0 postgres
[511250.459546] [ 3059]  1000  3059  3085528    16411     141        0             0 java
[511250.459547] [ 3146]  1000  3146  2027460   211779     628        0             0 java
[511250.459549] [17982]   996 17982  4950828   635077    1629        0             0 java
[511250.459551] [16433]     0 16433    37607      360      74        0             0 sshd
[511250.459553] [16436]     0 16436    29390      141      13        0             0 bash
[511250.459554] [16466]     0 16466    29390      136      14        0             0 bash
[511250.459556] [22511]     0 22511    36968      433      72        0             0 sshd
[511250.459558] [22515]     0 22515    19016      257      40        0             0 ssh
[511250.459560] [22519]     0 22519    19107      350      39        0             0 ssh
[511250.459562] [22522]     0 22522    19016      259      38        0             0 ssh
[511250.459563] [24770]     0 24770    38342      657      30        0             0 vim
[511250.459565] [24781]     0 24781    45009      303      41        0             0 crond
[511250.459566] [24784]     0 24784    91360     8641     134        0             0 python
[511250.459568] [24932]     0 24932    28791       45      13        0             0 sh
[511250.459570] [24933]     0 24933    93538     7284     104        0             0 ansible-playboo
[511250.459571] [24942]     0 24942    94424     7584     103        0             0 ansible-playboo
[511250.459573] [24943]     0 24943    96455     9707     107        0             0 ansible-playboo
[511250.459574] [24944]     0 24944    94436     7599     103        0             0 ansible-playboo
[511250.459576] [24945]     0 24945    16336       70      33        0             0 ssh
[511250.459578] [24946]     0 24946    16336       71      33        0             0 ssh
[511250.459579] [24947]     0 24947    16336       69      30        0             0 ssh
[511250.459581] Out of memory: Kill process 17982 (java) score 77 or sacrifice child
[511250.459642] Killed process 17982 (java) total-vm:19803312kB, anon-rss:2540308kB, file-rss:0kB, shmem-rss:0kB

(1)mysqld触发了oom killer,既mysqld要申请的内存大于了系统可用的物理内存大小。/proc/sys/vm/min_free_kbytes参数来控制,当系统可用内存(不包含buffer和cache)小于这个值的时候,系统会启动内核线程kswapd来对内存进行回收。而还是触发了oom killer,则表明内存真的不够用了或者在内存回收前或者回收中直接触发了oom killer。

(2)如下的输出表明了申请了3次内存都没有成功

1
2
3
4
5
6
7
8
9
10
11
12
[511250.459112] Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[511250.459117] lowmem_reserve[]: 0 2814 31994 31994
[511250.459120] Node 0 DMA32 free:119704kB min:5940kB low:7424kB high:8908kB active_anon:2678512kB inactive_anon:276kB active_file:124kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883436kB mlocked:0kB dirty:0kB writeback:0kB mapped:1100kB shmem:1632kB slab_reclaimable:48796kB slab_unreclaimable:9340kB kernel_stack:5248kB pagetables:11424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32902 all_unreclaimable? yes
[511250.459124] lowmem_reserve[]: 0 0 29180 29180
[511250.459127] Node 0 Normal free:63896kB min:61608kB low:77008kB high:92412kB active_anon:29011996kB inactive_anon:6336kB active_file:6576kB inactive_file:11148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29881068kB mlocked:0kB dirty:44kB writeback:8kB mapped:13328kB shmem:16776kB slab_reclaimable:198472kB slab_unreclaimable:94604kB kernel_stack:53472kB pagetables:159076kB unstable:0kB bounce:0kB free_pcp:656kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? no
[511250.459131] lowmem_reserve[]: 0 0 0 0
[511250.459134] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[511250.459144] Node 0 DMA32: 9372*4kB (UEM) 2427*8kB (UEM) 1179*16kB (UEM) 369*32kB (UEM) 104*64kB (EM) 31*128kB (EM) 14*256kB (UEM) 9*512kB (UEM) 7*1024kB (UEM) 3*2048kB (M) 0*4096kB = 119704kB
[511250.459154] Node 0 Normal: 1540*4kB (UE) 6148*8kB (UE) 503*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63392kB
[511250.459162] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[511250.459163] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[511250.459164] 9275 total pagecache pages

(3)被干掉进程信息

如需输出确认了被kill的进程为17982

1
2
[511250.459581] Out of memory: Kill process 17982 (java) score 77 or sacrifice child
[511250.459642] Killed process 17982 (java) total-vm:19803312kB, anon-rss:2540308kB, file-rss:0kB, shmem-rss:0kB

如下为17982进程占用的内存页数量635077,换算为内存占用量是635077*4096=2GB

1
2
[511250.459172] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[511250.459549] [17982]   996 17982  4950828   635077    1629        0

每列的含义为:

  • pid进程ID。

  • uid用户ID。

  • tgid线程组ID。

  • total_vm虚拟内存使用(单位为4 kB内存页)

  • rss居民 memory 使用(单位4 kB内存页)

  • nr_ptes页表项

  • swapents交换条目

  • oom_score_adj通常为0;较低的数字表示当调用OOM杀手时,进程将不太可能死亡。

(4)分析系统所有进程rss内存(rss为程序实际使用物理内存,单位为4kB内存页)

把oom输出中进程的rss内存相加,发现已经使用了32g,那就说明系统是内存耗尽了才触发的oom killer。而通过分析,发现java程序占用的的内存总量为26g,是最大头。


三,解决

使用的解决办法:

1,限制java进程的max heap,并且降低java程序的worker数量,从而降低内存使用

2,发现系统没有开启swap,给系统加了8G的swap空间

其它解决办法(不推荐),不允许内存申请过量:

# echo "2" > /proc/sys/vm/overcommit_memory

# echo "80" > /proc/sys/vm/overcommit_ratio


四,扩展

1,overcommit_memory(/proc/sys/vm/overcommit_memory)

Linux是允许memory overcommit的,只要你来申请内存我就给你,寄希望于进程实际上用不到那么多内存,但万一用到那么多了呢?那就会发生类似“银行挤兑”的危机,现金(内存)不足了。Linux设计了一个OOM killer机制(OOM = out-of-memory)来处理这种危机:挑选一个进程出来杀死,以腾出部分内存,如果还不够就继续杀…也可通过设置内核参数 vm.panic_on_oom 使得发生OOM时自动重启系统。这都是有风险的机制,重启有可能造成业务中断,杀死进程也有可能导致业务中断。所以Linux 2.6之后允许通过内核参数 vm.overcommit_memory 禁止memory overcommit。


(1)内核参数 vm.overcommit_memory 接受三种取值:

0 – Heuristic overcommit handling. 这是缺省值,它允许overcommit,但过于明目张胆的overcommit会被拒绝,比如malloc一次性申请的内存大小就超过了系统总内存。Heuristic的意思是“试探式的”,内核利用某种算法猜测你的内存申请是否合理,它认为不合理就会拒绝overcommit。

1 – Always overcommit. 允许overcommit,对内存申请来者不拒。内核执行无内存过量使用处理。使用这个设置会增大内存超载的可能性,但也可以增强大量使用内存任务的性能。

2 – Don’t overcommit. 禁止overcommit。 内存拒绝等于或者大于总可用 swap 大小以及 overcommit_ratio 指定的物理 RAM 比例的内存请求。如果您希望减小内存过度使用的风险,这个设置就是最好的。


(2)Heuristic overcommit算法在以下函数中实现,基本上可以这么理解:

单次申请的内存大小不能超过 【free memory + free swap + pagecache的大小 + SLAB中可回收的部分】,否则本次申请就会失败。


(3)关于禁止overcommit (vm.overcommit_memory=2) ,需要知道的是,怎样才算是overcommit呢?kernel设有一个阈值,申请的内存总数超过这个阈值就算overcommit,在/proc/meminfo中可以看到这个阈值的大小:

1
2
3
# grep -i commit /proc/meminfo
CommitLimit:     5967744 kB
Committed_AS:    5363236 kB


CommitLimit 就是overcommit的阈值,申请的内存总数超过CommitLimit的话就算是overcommit。

这个阈值是如何计算出来的呢?它既不是物理内存的大小,也不是free memory的大小,它是通过内核参数vm.overcommit_ratio或vm.overcommit_kbytes间接设置的,公式如下:
【CommitLimit = (Physical RAM * vm.overcommit_ratio / 100) + Swap】

注:
vm.overcommit_ratio 是内核参数,缺省值是50,表示物理内存的50%。如果你不想使用比率,也可以直接指定内存的字节数大小,通过另一个内核参数 vm.overcommit_kbytes 即可;
如果使用了huge pages,那么需要从物理内存中减去,公式变成:
CommitLimit = ([total RAM] – [total huge TLB RAM]) * vm.overcommit_ratio / 100 + swap
参见https://access.redhat.com/solutions/665023

/proc/meminfo中的 Committed_AS 表示所有进程已经申请的内存总大小,(注意是已经申请的,不是已经分配的),如果 Committed_AS 超过 CommitLimit 就表示发生了 overcommit,超出越多表示 overcommit 越严重。Committed_AS 的含义换一种说法就是,如果要绝对保证不发生OOM (out of memory) 需要多少物理内存。

(4)“sar -r”是查看内存使用状况的常用工具,它的输出结果中有两个与overcommit有关,kbcommit 和 %commit:
kbcommit对应/proc/meminfo中的 Committed_AS;
%commit的计算公式并没有采用 CommitLimit作分母,而是Committed_AS/(MemTotal+SwapTotal),意思是_内存申请_占_物理内存与交换区之和_的百分比。

1
2
3
# sar -r 
05:00:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
05:10:01 PM    160576   3648460     95.78         0   1846212   4939368     62.74   1390292   1854880         4


2,panic_on_oom(/proc/sys/vm/panic_on_oom)

决定系统出现oom的时候,要做的操作。接受的三种取值如下:

0 - 默认值,当出现oom的时候,触发oom killer

1 - 程序在有cpuset、memory policy、memcg的约束情况下的OOM,可以考虑不panic,而是启动OOM killer。其它情况触发 kernel panic,即系统直接重启

2 - 当出现oom,直接触发kernel panic,即系统直接重启


3,oom_adj、oom_score_adj和oom_score

准确的说这几个参数都是和具体进程相关的,因此它们位于/proc/xxx/目录下(xxx是进程ID)。假设我们选择在出现OOM状况的时候杀死进程,那么一个很自然的问题就浮现出来:到底干掉哪一个呢?内核的算法倒是非常简单,那就是打分(oom_score,注意,该参数是read only的),找到分数最高的就OK了。那么怎么来算分数呢?可以参考内核中的oom_badness函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, 
              const nodemask_t *nodemask, unsigned long totalpages) 
{……
    adj = (long)p->signal->oom_score_adj; 
    if (adj == OOM_SCORE_ADJ_MIN) {----------------------(1) 
        task_unlock(p); 
        return 0;---------------------------------(2) 
    }
    points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + 
        atomic_long_read(&p->mm->nr_ptes) + mm_nr_pmds(p->mm);---------(3) 
    task_unlock(p);
 
    if (has_capability_noaudit(p, CAP_SYS_ADMIN))-----------------(4) 
        points -= (points * 3) / 100;
    adj *= totalpages / 1000;----------------------------(5) 
    points += adj;  
 
    return points > 0 ? points : 1; 
}

(1)对某一个task进行打分(oom_score)主要有两部分组成,一部分是系统打分,主要是根据该task的内存使用情况。另外一部分是用户打分,也就是oom_score_adj了,该task的实际得分需要综合考虑两方面的打分。如果用户将该task的 oom_score_adj设定成OOM_SCORE_ADJ_MIN(-1000)的话,那么实际上就是禁止了OOM killer杀死该进程。

(2)这里返回了0也就是告知OOM killer,该进程是“good process”,不要干掉它。后面我们可以看到,实际计算分数的时候最低分是1分。

(3)前面说过了,系统打分就是看物理内存消耗量,主要是三部分,RSS部分,swap file或者swap device上占用的内存情况以及页表占用的内存情况。

(4)root进程有3%的内存使用特权,因此这里要减去那些内存使用量。

(5)用户可以调整oom_score,具体如何操作呢?oom_score_adj的取值范围是-1000~1000,0表示用户不调整oom_score,负值表示要在实际打分值上减去一个折扣,正值表示要惩罚该task,也就是增加该进程的oom_score。在实际操作中,需要根据本次内存分配时候可分配内存来计算(如果没有内存分配约束,那么就是系统中的所有可用内存,如果系统支持cpuset,那么这里的可分配内存就是该cpuset的实际额度值)。oom_badness函数有一个传入参数totalpages,该参数就是当时的可分配的内存上限值。实际的分数值(points)要根据oom_score_adj进行调整,例如如果oom_score_adj设定-500,那么表示实际分数要打五折(基数是totalpages),也就是说该任务实际使用的内存要减去可分配的内存上限值的一半。


了解了oom_score_adj和oom_score之后,应该是尘埃落定了,oom_adj是一个旧的接口参数,其功能类似oom_score_adj,为了兼容,目前仍然保留这个参数,当操作这个参数的时候,kernel实际上是会换算成oom_score_adj,有兴趣的同学可以自行了解,这里不再细述了。

plus:

由任意调整的进程衍生的任意进程将继承该进程的 oom_score。例如:如果 sshd 进程不受 oom_killer功能影响,所有由 SSH 会话产生的进程都将不受其影响。这可在出现 OOM 时影响 oom_killer 功能救援系统的能力。


4,min_free_kbytes(/proc/sys/vm/min_free_kbytes)

先看官方解释:

This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size.

Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads.

Setting this too high will OOM your machine instantly.

解释已经很清楚了,主要有以下几个关键点:

(1). 代表系统所保留空闲内存的最低限。

在系统初始化时会根据内存大小计算一个默认值,计算规则是:

1
  min_free_kbytes = sqrt(lowmem_kbytes * 16) = 4 * sqrt(lowmem_kbytes)(注:lowmem_kbytes即可认为是系统内存大小)

另外,计算出来的值有最小最大限制,最小为128K,最大为64M。

可以看出,min_free_kbytes随着内存的增大不是线性增长,comments里提到了原因“because network bandwidth does not increase linearly with machine size”。随着内存的增大,没有必要也线性的预留出过多的内存,能保证紧急时刻的使用量便足矣。


(2).min_free_kbytes的主要用途是计算影响内存回收的三个参数 watermark[min/low/high]

1) watermark[high] > watermark [low] > watermark[min],各个zone各一套

2)在系统空闲内存低于 watermark[low]时,开始启动内核线程kswapd进行内存回收(每个zone一个),直到该zone的空闲内存数量达到watermark[high]后停止回收。如果上层申请内存的速度太快,导致空闲内存降至watermark[min]后,内核就会进行direct reclaim(直接回收),即直接在应用程序的进程上下文中进行回收,再用回收上来的空闲页满足内存申请,因此实际会阻塞应用程序,带来一定的响应延迟,而且可能会触发系统OOM。这是因为watermark[min]以下的内存属于系统的自留内存,用以满足特殊使用,所以不会给用户态的普通申请来用。

3)三个watermark的计算方法:

1
2
3
 watermark[min] = min_free_kbytes换算为page单位即可,假设为min_free_pages。(因为是每个zone各有一套watermark参数,实际计算效果是根据各个zone大小所占内存总大小的比例,而算出来的per zone min_free_pages)
 watermark[low] = watermark[min] * 5 / 4
 watermark[high] = watermark[min] * 3 / 2

所以中间的buffer量为 high - low = low - min = per_zone_min_free_pages * 1/4。因为min_free_kbytes = 4* sqrt(lowmem_kbytes),也可以看出中间的buffer量也是跟内存的增长速度成开方关系。

4)可以通过/proc/zoneinfo查看每个zone的watermark

例如:

1
2
3
4
5
Node 0, zone      DMA
pages free     3960
       min      65
       low      81
       high     97


(3).min_free_kbytes大小的影响

min_free_kbytes设的越大,watermark的线越高,同时三个线之间的buffer量也相应会增加。这意味着会较早的启动kswapd进行回收,且会回收上来较多的内存(直至watermark[high]才会停止),这会使得系统预留过多的空闲内存,从而在一定程度上降低了应用程序可使用的内存量。极端情况下设置min_free_kbytes接近内存大小时,留给应用程序的内存就会太少而可能会频繁地导致OOM的发生。

min_free_kbytes设的过小,则会导致系统预留内存过小。kswapd回收的过程中也会有少量的内存分配行为(会设上PF_MEMALLOC)标志,这个标志会允许kswapd使用预留内存;另外一种情况是被OOM选中杀死的进程在退出过程中,如果需要申请内存也可以使用预留部分。这两种情况下让他们使用预留内存可以避免系统进入deadlock状态。


5,lowmem与highmem

关于lowmem和highmem的定义在这里就不详细展开了,推荐两篇文章:

http://ilinuxkernel.com/?p=1013


链接内讲的比较清楚,这里只说结论:

(1)当系统的物理内存 > 内核的地址空间范围时,才需要引入highmem概念。

x86架构下,linux默认会把进程的虚拟地址空间(4G)按3:1拆分,0~3G user space通过页表映射,3G-4G kernel space线性映射到进程高地址。就是说,x86机器的物理内存超过1G时,需要引入highmem概念。

(2)内核不能直接访问1G以上的物理内存(因为这部分内存没法映射到内核的地址空间),当内核需要访问1G以上的物理内存时,需要通过临时映射的方式,把高地址的物理内存映射到内核可以访问的地址空间里。

(3)当lowmem被占满之后,就算剩余的物理内存很大,还是会出现oom的情况。对于linux2.6来说,oom之后会根据score杀掉一个进程(oom的话题这里不展开了)。

(4)x86_64架构下,内核可用的地址空间远大于实际物理内存空间,所以目前没有上面讨论的highmem的问题,可以认为系统内存等于lowmem。


6,lowmem_reserve_ratio(/proc/sys/vm/lowmem_reserve_ratio)

官方解释:

For some specialised workloads on highmem machines it is dangerous for the kernel to allow process memory to be allocated from the "lowmem" zone. This is because that memory could then be pinned via the mlock() system call, or by unavailability of swapspace.

And on large highmem machines this lack of reclaimable lowmem memory can be fatal.

So the Linux page allocator has a mechanism which prevents allocations which _could_ use highmem from using too much lowmem. This means that a certain amount of lowmem is defended from the possibility of being captured into pinned user memory.

The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is in defending these lower zones.

If you have a machine which uses highmem or ISA DMA and your applications are using mlock(), or if you are running with no swap then you probably should change the lowmem_reserve_ratio setting.

(1).作用

除了min_free_kbytes会在每个zone上预留一部分内存外,lowmem_reserve_ratio是在各个zone之间进行一定的防卫预留,主要是防止高端zone在没内存的情况下过度使用低端zone的内存资源。

例如现在常见的一个node的机器有三个zone: DMA,DMA32和NORMAL。DMA和DMA32属于低端zone,内存也较小,如96G内存的机器两个zone总和才1G左右,NORMAL就相对属于高端内存(现在一般没有HIGH zone),而且数量较大(>90G)。低端内存有一定的特殊作用比如发生DMA时只能分配DMA zone的低端内存,因此需要在 尽量可以使用高端内存时 而 不使用低端内存,同时防止高端内存分配不足的时候抢占稀有的低端内存。


(2). 计算方法

# cat /proc/sys/vm/lowmem_reserve_ratio

256     256     32


内核利用上述的protection数组计算每个zone的预留page量,计算出来也是数组形式,从/proc/zoneinfo里可以查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
Node 0, zone      DMA
 pages free     1355
       min      3
       low      3
       high     4
       :
       :
   numa_other   0
       protection: (0, 2004, 2004, 2004)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 pagesets
   cpu: 0 pcp: 0
       :

在进行内存分配时,这些预留页数值和watermark相加来一起决定现在是满足分配请求,还是认为空闲内存量过低需要启动回收。

例如,如果一个normal区(index = 2)的页申请来试图分配DMA区的内存,且现在使用的判断标准是watermark[low]时,内核计算出 page_free = 1355,而watermark + protection[2] = 3 + 2004 = 2007 > page_free,则认为空闲内存太少而不予以分配。如果分配请求本就来自DMA zone,则 protection[0] = 0会被使用,而满足分配申请。

zone[i] 的 protection[j] 计算规则如下:

1
2
3
4
5
6
7
8
(i < j):
 zone[i]->protection[j]
 = (total sums of present_pages from zone[i+1] to zone[j] on the node)
   / lowmem_reserve_ratio[i];
(i = j):
  (should not be protected. = 0;
(i > j):
  (not necessary, but looks 0)

默认的 lowmem_reserve_ratio[i] 值是:

   256 (if zone[i] means DMA or DMA32 zone)

   32  (others).

从上面的计算规则可以看出,预留内存值是ratio的倒数关系,如果是256则代表 1/256,即为 0.39% 的高端zone内存大小。如果想要预留更多页,应该设更小一点的值,最小值是1(1/1 -> 100%)。


(3). 和min_free_kbytes(watermark)的配合示例

下面是一段某线上服务器(96G)内存申请失败时打印出的log:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[38905.295014] java: page allocation failure. order:1, mode:0x20, zone 2
[38905.295020] Pid: 25174, comm: java Not tainted 2.6.32-220.23.1.tb750.el5.x86_64 #1
...
[38905.295348] active_anon:5730961 inactive_anon:216708 isolated_anon:0
[38905.295349]  active_file:2251981 inactive_file:15562505 isolated_file:0
[38905.295350]  unevictable:1256 dirty:790255 writeback:0 unstable:0
[38905.295351]  free:113095 slab_reclaimable:577285 slab_unreclaimable:31941
[38905.295352]  mapped:7816 shmem:4 pagetables:13911 bounce:0
[38905.295355] Node 0 DMA free:15796kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB  isolated(anon):0kB isolated(file):0kB present:15332kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[38905.295365] lowmem_reserve[]: 0 1951 96891 96891
[38905.295369] Node 0 DMA32 free:380032kB min:800kB low:1000kB high:1200kB active_anon:46056kB inactive_anon:10876kB active_file:15968kB inactive_file:129772kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1998016kB mlocked:0kB dirty:20416kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:11716kB slab_unreclaimable:160kB kernel_stack:176kB pagetables:112kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:576 all_unreclaimable? no
[38905.295379] lowmem_reserve[]: 0 0 94940 94940
[38905.295383] Node 0 Normal free:56552kB min:39032kB low:48788kB high:58548kB active_anon:22877788kB inactive_anon:855956kB active_file:8991956kB inactive_file:62120248kB unevictable:5024kB isolated(anon):0kB isolated(file):0kB present:97218560kB mlocked:5024kB dirty:3140604kB writeback:0kB mapped:31264kB shmem:16kB slab_reclaimable:2297424kB slab_unreclaimable:127604kB kernel_stack:12528kB pagetables:55532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[38905.295393] lowmem_reserve[]: 0 0 0 0
[38905.295396] Node 0 DMA: 1*4kB 2*8kB 0*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15796kB
[38905.295405] Node 0 DMA32: 130*4kB 65*8kB 75*16kB 72*32kB 95*64kB 22*128kB 10*256kB 7*512kB 4*1024kB 2*2048kB 86*4096kB = 380032kB
[38905.295414] Node 0 Normal: 12544*4kB 68*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 54816kB
[38905.295423] 17816926 total pagecache pages

1)从第一行log“order:1, mode:0x20”可以看出来是GFP_ATOMIC类型的申请,且order = 1(page = 2 )

2)第一次内存申请尝试

在__alloc_pages_nodemask()里,首先调用 get_page_from_freelist() 尝试第一次申请,使用的标志位是 ALLOC_WMARK_LOW|ALLOC_CPUSET,它会对每个zone都做 zone_watermark_ok()的检查,使用的就是传进的watermark[low]阈值。

在zone_watermark_ok()里会考虑z->lowmem_reserve[],导致在normal上的申请不会落到低端zone。比如对于DMA32:

free pages = 380032KB = 95008 pages < low(1000KB = 250 pages) +  lowmem_reserve[normal](94940) = 95190

所以就认为DMA32也不平不ok,同理更用不了DMA的内存。

而对于normal自己内存来说,free pages = 56552 KB = 14138 pages,也不用考虑lowmem_reserve(0),但这时还会考虑申请order(1),减去order 0的12544个page后只剩 14138 - 12544 = 1594,也小于 low / 2 = (48788KB=12197pages) / 2 = 6098 pages。

所以初次申请尝试失败,进入__alloc_pages_slowpath() 尝试进行更为积极一些的申请。

3)第二次内存申请尝试

__alloc_pages_slowpath()首先是通过 gfp_to_alloc_flags() 修改alloc_pages,设上更为强硬的标志位。这块根据原来的GFP_ATOMIC会设上 ALLOC_WMARK_MIN | ALLOC_HARDER | ALLOC_HIGH。但注意的是不会设上 ALLOC_NO_WATERMARKS 标志位。这个标志位不再判断zone的水位限制,属于优先级最高的申请,可以动用所有的reserve内存,但条件是(!in_interrupt() && ((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))),即要求不能在中断上下文,且是正在进行回收(例如kswapd)或者正在退出的进程。

之后进入拿着新的alloc_pages重新进入get_page_from_pagelist() 尝试第二次申请,虽然有了 ALLOC_HARDER和ALLOC_HIGH,但是不幸的是在3个zone的zone_watermark_ok检查中还是都无法通过,例如对于DMA32:

free pages = 380032KB = 95008 pages

因为设上了ALLOC_HIGH 所以会将得到的watermark[min]减半,即min = min/2 = 800K / 2 = 400K = 100pages

而又因为设上了ALLOC_HARDER,会再将min砍去1/4,即min = 3 * min / 4 = 100 pages * 3 / 4 = 75 pages

即便如此,min(75 pages) +  lowmem_reserve[normal](94940) = 95015,仍大于free pages,仍认为无法分配内存,同理DMA也不不成功,而normal中 free pages里连续8K的页太少也无法满足分配

第二次失败后,由于没有ALLOC_NO_WATERMARK也不会进入__alloc_pages_high_priority 进行最高优先级的申请,同时由于是GFP_ATOMIC类型的分配不能阻塞回收或者进入OOM,因此就以申请失败告终。


遇到此种情况可以适当调高 min_free_kbytes 使kswapd较早启动回收,使系统一直留有较多的空闲内存,同时可以适度降低 lowmem_reserve_ratio(可选),使得内存不足的情况下(主要是normal zone)可以借用DMA32/DMA的内存救急(注意不能也不能过低)。










本文转自 leejia1989 51CTO博客,原文链接:http://blog.51cto.com/leejia/1952482,如需转载请自行联系原作者
上一篇:Android组件系列----BroadcastReceiver广播接收器


下一篇:区块链开发(四)Nodejs下载&安装