一次.net托管内存泄露分析

一次.net托管内存泄露分析

最近协助分析了一个.net进程内存泄露的问题,过程分享给大家。

症状:客户的服务端.net进程出现分钟级的cpu抖动,接近100%后落回。

一次.net托管内存泄露分析 图1

分析:支持同学通过procdump.exe工具抓取了进程dump, 设定触发dump的条件为,若进程的CPU使用量超过80%持续1秒,则开始抓取。

procdump.exe -ma -s 1 -c 80 10672 f:\aliyun


Loading Dump File [E:\temp\201127\GameServer.exe_201119_171245.dmp\GameServer.exe_201119_171245.dmp]

User Mini Dump File with Full Memory: Only application data is available

 

Comment: '

*** e:\soft\procdump\procdump.exe -ma -s 1 -c 80 10672 f:\aliyun

*** Process exceeded 80% CPU (system scale) for 1 second. Value: 88%. Hottest Thread: 4196 (0x1064).'

 

************* Path validation summary **************

Response Time (ms) Location

Deferred srv*F:\symbols*https://msdl.microsoft.com/download/symbols

Symbol search path is: srv*F:\symbols*https://msdl.microsoft.com/download/symbols

Executable search path is:

Windows 8.1 Version 9600 MP (32 procs) Free x64

Product: Server, suite: TerminalServer DataCenter SingleUserTS

6.3.9600.18217 (winblue_ltsb.160124-0053)

Machine Name:

Debug session time: Thu Nov 19 17:12:45.000 2020 (UTC + 8:00)

System Uptime: 38 days 3:36:48.460

Process Uptime: 0 days 0:33:22.000

 

在dump抓取时,所采样的系统CPU负载高达91%。

0:059> .loadby sos clr

0:059> !threadpool

CPU utilization: 91%

Worker Thread: Total: 57 Running: 3 Idle: 49 MaxLimit: 32767 MinLimit: 32

Work Request in Queue: 0

--------------------------------------

Number of Timers: 1

--------------------------------------

Completion Port Thread:Total: 89 Free: 88 MaxFree: 64 CurrentLimit: 89 MaxLimit: 1000 MinLimit: 65

 

查看dump抓取瞬间,有为数不多的几个线程在使用CPU。

1)    Thread 37

0:059> ~37s

mscorlib_ni!System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef)$##600184B+0x86:

00007ffb`c5923d76 48894de0 mov qword ptr [rbp-20h],rcx ss:000000e2`16b0ee20=000000e3cde2fc10

0:037> kL

# Child-SP RetAddr Call Site

00 000000e2`16b0edf0 00007ffb`c5923cc2 mscorlib_ni!System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef)$##600184B+0x86

01 000000e2`16b0ee50 00007ffb`c5923aa7 mscorlib_ni!System.IO.FileStream.WriteCore(Byte[], Int32, Int32)$##600183D+0x62

02 000000e2`16b0eec0 00007ffb`c5923a34 mscorlib_ni!System.IO.FileStream.FlushInternalBuffer()$##600182F+0x57

03 000000e2`16b0ef00 00007ffb`c58d4f4c mscorlib_ni!System.IO.FileStream.Flush(Boolean)$##600182E+0x24

04 000000e2`16b0ef40 00007ffb`69ccac7b mscorlib_ni!System.IO.StreamWriter.Flush(Boolean, Boolean)$##60019BE+0x8c

05 000000e2`16b0efa0 00007ffb`69ccaaae 0x00007ffb`69ccac7b

06 000000e2`16b0efe0 00007ffb`69ca103e 0x00007ffb`69ccaaae

07 000000e2`16b0f030 00007ffb`c58eca72 0x00007ffb`69ca103e

08 000000e2`16b0f070 00007ffb`c58ec904 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A95+0x162

09 000000e2`16b0f140 00007ffb`c58ec8c2 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A94+0x14

0a 000000e2`16b0f170 00007ffb`c5926472 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)$##6003A93+0x52

0b 000000e2`16b0f1c0 00007ffb`c8fd6793 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()$##6003B8E+0x52

0c 000000e2`16b0f200 00007ffb`c8fd6665 clr!CallDescrWorkerInternal+0x83

0d 000000e2`16b0f240 00007ffb`c8fd736d clr!CallDescrWorkerWithHandler+0x4e

0e 000000e2`16b0f280 00007ffb`c90bbf59 clr!MethodDescCallSite::CallTargetWorker+0xf8

0f 000000e2`16b0f380 00007ffb`c8fd7ce5 clr!ThreadNative::KickOffThread_Worker+0x109

10 000000e2`16b0f5e0 00007ffb`c8fd7c60 clr!Frame::Push+0x59

11 000000e2`16b0f620 00007ffb`c8fd7b9e clr!FillInRegTypeMap+0x198

12 000000e2`16b0f720 00007ffb`c8fd7d1f clr!FillInRegTypeMap+0xc1

13 000000e2`16b0f7b0 00007ffb`c90bbe3b clr!FillInRegTypeMap+0x47

14 000000e2`16b0f810 00007ffb`c919159f clr!ThreadNative::KickOffThread+0xdb

15 000000e2`16b0f8e0 00007ffb`d90d13d2 clr!Thread::intermediateThreadProc+0x86

16 000000e2`16b0fa20 00007ffb`d92254f4 kernel32!BaseThreadInitThunk+0x22

17 000000e2`16b0fa50 00000000`00000000 ntdll!RtlUserThreadStart+0x34

0:037> ub rip

mscorlib_ni!System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef)$##600184B+0x6c:

00007ffb`c5923d5c c9 leave

00007ffb`c5923d5d 4903d1 add rdx,r9

00007ffb`c5923d60 4533c9 xor r9d,r9d

00007ffb`c5923d63 4c894c2420 mov qword ptr [rsp+20h],r9

00007ffb`c5923d68 4c8d4de8 lea r9,[rbp-18h]

00007ffb`c5923d6c 448bc0 mov r8d,eax

00007ffb`c5923d6f e8e40bf2ff call mscorlib_ni!System.Runtime.Remoting.Activation.ActivationServices.GetActivator()$##6005B45 (mscorlib_ni+0x434958) (00007ffb`c5844958)

00007ffb`c5923d74 33c9 xor ecx,ecx

 

2)    Thread 39

0:042> ~39s

mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A+0x1df:

00007ffb`c5963cef f7c280ff80ff test edx,0FF80FF80h

0:039> kL

# Child-SP RetAddr Call Site

00 000000e2`16d0ee40 00007ffb`c58d50be mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A+0x1df

01 000000e2`16d0eed0 00007ffb`c58d4f17 mscorlib_ni!System.Text.EncoderNLS.GetBytes(Char[], Int32, Int32, Byte[], Int32, Boolean)$##6006608+0x11e

02 000000e2`16d0ef60 00007ffb`69ccac7b mscorlib_ni!System.IO.StreamWriter.Flush(Boolean, Boolean)$## 60019BE+0x57

03 000000e2`16d0efc0 00007ffb`69ccaaae 0x00007ffb`69ccac7b

04 000000e2`16d0f000 00007ffb`69ca103e 0x00007ffb`69ccaaae

05 000000e2`16d0f050 00007ffb`c58eca72 0x00007ffb`69ca103e

06 000000e2`16d0f090 00007ffb`c58ec904 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A95+0x162

07 000000e2`16d0f160 00007ffb`c58ec8c2 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A94+0x14

08 000000e2`16d0f190 00007ffb`c5926472 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)$##6003A93+0x52

09 000000e2`16d0f1e0 00007ffb`c8fd6793 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()$##6003B8E+0x52

0a 000000e2`16d0f220 00007ffb`c8fd6665 clr!CallDescrWorkerInternal+0x83

0b 000000e2`16d0f260 00007ffb`c8fd736d clr!CallDescrWorkerWithHandler+0x4e

0c 000000e2`16d0f2a0 00007ffb`c90bbf59 clr!MethodDescCallSite::CallTargetWorker+0xf8

0d 000000e2`16d0f3a0 00007ffb`c8fd7ce5 clr!ThreadNative::KickOffThread_Worker+0x109

0e 000000e2`16d0f600 00007ffb`c8fd7c60 clr!Frame::Push+0x59

0f 000000e2`16d0f640 00007ffb`c8fd7b9e clr!FillInRegTypeMap+0x198

10 000000e2`16d0f740 00007ffb`c8fd7d1f clr!FillInRegTypeMap+0xc1

11 000000e2`16d0f7d0 00007ffb`c90bbe3b clr!FillInRegTypeMap+0x47

12 000000e2`16d0f830 00007ffb`c919159f clr!ThreadNative::KickOffThread+0xdb

13 000000e2`16d0f900 00007ffb`d90d13d2 clr!Thread::intermediateThreadProc+0x86

14 000000e2`16d0fb40 00007ffb`d92254f4 kernel32!BaseThreadInitThunk+0x22

15 000000e2`16d0fb70 00000000`00000000 ntdll!RtlUserThreadStart+0x34

0:039> ub rip

mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A+0x1c0:

00007ffb`c5963cd0 4889442438 mov qword ptr [rsp+38h],rax

00007ffb`c5963cd5 e979050000 jmp mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A+0x743 (00007ffb`c5964253)

00007ffb`c5963cda 488b4c2440 mov rcx,qword ptr [rsp+40h]

00007ffb`c5963cdf 448b19 mov r11d,dword ptr [rcx]

00007ffb`c5963ce2 488b4c2440 mov rcx,qword ptr [rsp+40h]

00007ffb`c5963ce7 8b4904 mov ecx,dword ptr [rcx+4]

00007ffb`c5963cea 418bd3 mov edx,r11d

00007ffb`c5963ced 0bd1 or edx,ecx

 

3)    Thread 52

0:039> ~52s

clr!SVR::gc_heap::background_mark_simple1+0x48:

00007ffb`c9162f38 488bd7 mov rdx,rdi

0:052> kL

# Child-SP RetAddr Call Site

00 000000e2`17e0f140 00007ffb`c91631ae clr!SVR::gc_heap::background_mark_simple1+0x48

01 000000e2`17e0f1b0 00007ffb`c9163f14 clr!SVR::gc_heap::background_mark_simple+0x91

02 000000e2`17e0f1e0 00007ffb`c91628b4 clr!SVR::gc_heap::background_drain_mark_list+0x50

03 000000e2`17e0f210 00007ffb`c934f660 clr!SVR::gc_heap::background_mark_phase+0x3bf

04 000000e2`17e0f2a0 00007ffb`c9162244 clr! ?? ::FNODOBFM::`string'+0x8082a

05 000000e2`17e0f2f0 00007ffb`c919159f clr!SVR::gc_heap::bgc_thread_function+0x132

06 000000e2`17e0f340 00007ffb`d90d13d2 clr!Thread::intermediateThreadProc+0x86

07 000000e2`17e0fb80 00007ffb`d92254f4 kernel32!BaseThreadInitThunk+0x22

08 000000e2`17e0fbb0 00000000`00000000 ntdll!RtlUserThreadStart+0x34

0:052> ub rip

clr!SVR::gc_heap::background_mark_simple1+0x26:

00007ffb`c9162f16 488bfa mov rdi,rdx

00007ffb`c9162f19 488bd9 mov rbx,rcx

00007ffb`c9162f1c 4c8989f01e0000 mov qword ptr [rcx+1EF0h],r9

00007ffb`c9162f23 4d8d04c1 lea r8,[r9+rax*8]

00007ffb`c9162f27 4c89442478 mov qword ptr [rsp+78h],r8

00007ffb`c9162f2c 4533db xor r11d,r11d

00007ffb`c9162f2f 4885ff test rdi,rdi

00007ffb`c9162f32 0f84e1010000 je clr!SVR::gc_heap::background_mark_simple1+0x901 (00007ffb`c9163119)

 

4)    Thread 113

0:052> ~113s

MSVCR120_CLR0400!memset+0x23:

00007ffb`c8f0f8b3 f3aa rep stos byte ptr [rdi]

0:113> kL

# Child-SP RetAddr Call Site

00 000000e2`1c7fd580 000000ec`7cb76810 MSVCR120_CLR0400!memset+0x23

01 000000e2`1c7fd588 00007ffb`c918f750 0x000000ec`7cb76810

02 000000e2`1c7fd590 00007ffb`c918f3d2 clr!SVR::gc_heap::adjust_limit_clr+0xe0

03 000000e2`1c7fd5e0 00007ffb`c914625f clr!SVR::gc_heap::allocate_small+0x3ae

04 000000e2`1c7fd6a0 00007ffb`c58e0e5c clr!JIT_New+0x61f

*** WARNING: Unable to verify checksum for System.Core.ni.dll

*** ERROR: Module load completed but symbols could not be loaded for System.Core.ni.dll

05 000000e2`1c7fdae0 00007ffb`c36fb3ae mscorlib_ni!System.Collections.Generic.List`1[System.__Canon].System.Collections.Generic.IEnumerable.GetEnumerator()$##60039A3+0x4c

06 000000e2`1c7fdb40 00007ffb`6ad21c2d System_Core_ni+0x2db3ae

07 000000e2`1c7fdbb0 00007ffb`6ad21544 0x00007ffb`6ad21c2d

08 000000e2`1c7fdc00 00007ffb`6ad0a240 0x00007ffb`6ad21544

09 000000e2`1c7fdc60 00007ffb`6ad09cfb 0x00007ffb`6ad0a240

0a 000000e2`1c7fdcc0 00007ffb`6ad0920c 0x00007ffb`6ad09cfb

0b 000000e2`1c7fdd00 00007ffb`6ad07790 0x00007ffb`6ad0920c

0c 000000e2`1c7fdd40 00007ffb`6ace5a30 0x00007ffb`6ad07790

0d 000000e2`1c7fdda0 00007ffb`6ace38e8 0x00007ffb`6ace5a30

0e 000000e2`1c7fde90 00007ffb`6ace1a7b 0x00007ffb`6ace38e8

0f 000000e2`1c7fdf80 00007ffb`6ace1407 0x00007ffb`6ace1a7b

10 000000e2`1c7fe080 00007ffb`6a80981e 0x00007ffb`6ace1407

11 000000e2`1c7fe0b0 00007ffb`6a8081db 0x00007ffb`6a80981e

12 000000e2`1c7fe110 00007ffb`c58eca72 0x00007ffb`6a8081db

...

20 000000e2`1c7fe710 00007ffb`c8fd6665 clr!CallDescrWorkerInternal+0x83

21 000000e2`1c7fe750 00007ffb`c8fd736d clr!CallDescrWorkerWithHandler+0x4e

22 000000e2`1c7fe790 00007ffb`c8fdaf69 clr!MethodDescCallSite::CallTargetWorker+0xf8

23 000000e2`1c7fe890 00007ffb`c8fd7ce5 clr!QueueUserWorkItemManagedCallback+0x2a

24 000000e2`1c7fe980 00007ffb`c8fd7c60 clr!Frame::Push+0x59

25 000000e2`1c7fe9c0 00007ffb`c8fd7b9e clr!FillInRegTypeMap+0x198

26 000000e2`1c7feac0 00007ffb`c8fd7d1f clr!FillInRegTypeMap+0xc1

27 000000e2`1c7feb50 00007ffb`c8fdaa70 clr!FillInRegTypeMap+0x47

28 000000e2`1c7febb0 00007ffb`c8fd82b8 clr!ManagedPerAppDomainTPCount::DispatchWorkItem+0xa0

29 000000e2`1c7fed30 00007ffb`c8fd8195 clr!ThreadpoolMgr::ExecuteWorkRequest+0x64

2a 000000e2`1c7fed60 00007ffb`c919159f clr!ThreadpoolMgr::WorkerThreadStart+0xf5

2b 000000e2`1c7fee00 00007ffb`d90d13d2 clr!Thread::intermediateThreadProc+0x86

2c 000000e2`1c7ffbc0 00007ffb`d92254f4 kernel32!BaseThreadInitThunk+0x22

2d 000000e2`1c7ffbf0 00000000`00000000 ntdll!RtlUserThreadStart+0x34

0:113> ub rip

MSVCR120_CLR0400!memset+0x6:

00007ffb`c8f0f896 4983f810 cmp r8,10h

00007ffb`c8f0f89a 0f825c010000 jb MSVCR120_CLR0400!memset+0x16c (00007ffb`c8f0f9fc)

00007ffb`c8f0f8a0 0fba25b08e0a0001 bt dword ptr [MSVCR120_CLR0400!_favor (00007ffb`c8fb8758)],1

00007ffb`c8f0f8a8 730e jae MSVCR120_CLR0400!memset+0x28 (00007ffb`c8f0f8b8)

00007ffb`c8f0f8aa 57 push rdi

00007ffb`c8f0f8ab 488bf9 mov rdi,rcx

00007ffb`c8f0f8ae 8bc2 mov eax,edx

00007ffb`c8f0f8b0 498bc8 mov rcx,r8

 

但是我们看到这台机器是32核心的服务器:

0:113> !cpuid

CP F/M/S Manufacturer MHz

0 6,5,7 2500

1 6,5,7 2500

2 6,5,7 2500

3 6,5,7 2500

4 6,5,7 2500

5 6,5,7 2500

6 6,5,7 2500

7 6,5,7 2500

8 6,5,7 2500

9 6,5,7 2500

10 6,5,7 2500

11 6,5,7 2500

12 6,5,7 2500

13 6,5,7 2500

14 6,5,7 2500

15 6,5,7 2500

16 6,5,7 2500

17 6,5,7 2500

18 6,5,7 2500

19 6,5,7 2500

20 6,5,7 2500

21 6,5,7 2500

22 6,5,7 2500

23 6,5,7 2500

24 6,5,7 2500

25 6,5,7 2500

26 6,5,7 2500

27 6,5,7 2500

28 6,5,7 2500

29 6,5,7 2500

30 6,5,7 2500

31 6,5,7 2500

那么,上述几个线程不至于将服务器的CPU飙高。 dump抓取时,实际上这个进程CPU占用并不高,因此,我们也就无法通过分析这个dump中的线程行为来直接找到high cpu的原因了。

我们留意到,这个dump本身也是很大的,dump文件本身在20G左右,而且绝大多数内存为.net托管。

0:113> !address -summary

 

Mapping file section regions...


Mapping module regions...

Mapping PEB regions...

Mapping TEB and stack regions...

Mapping heap regions...

Mapping page heap regions...

Mapping other regions...

Mapping stack trace database regions...

Mapping activation context regions...

 

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal

Free 166 7ff5`c19eb000 ( 127.960 TB) 99.97%

999 a`1af16000 ( 40.421 GB) 98.65% 0.03%

Stack 862 0`10240000 ( 258.250 MB) 0.62% 0.00%

Image 712 0`0a0b6000 ( 160.711 MB) 0.38% 0.00%

Heap 61 0`08fef000 ( 143.934 MB) 0.34% 0.00%

TEB 284 0`00238000 ( 2.219 MB) 0.01% 0.00%

Other 9 0`001d1000 ( 1.816 MB) 0.00% 0.00%

PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00%

 

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal

MEM_PRIVATE 1845 a`31a9f000 ( 40.776 GB) 99.52% 0.03%

MEM_IMAGE 1048 0`0ab32000 ( 171.195 MB) 0.41% 0.00%

MEM_MAPPED 35 0`02034000 ( 32.203 MB) 0.08% 0.00%

 

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal

MEM_FREE 166 7ff5`c19eb000 ( 127.960 TB) 99.97%

MEM_RESERVE 655 5`2804b000 ( 20.625 GB) 50.34% 0.02%

MEM_COMMIT 2273 5`165ba000 ( 20.349 GB) 49.66% 0.02%

 

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal

PAGE_READWRITE 1252 5`0a7de000 ( 20.164 GB) 49.21% 0.02%

PAGE_EXECUTE_READ 74 0`073c5000 ( 115.770 MB) 0.28% 0.00%

PAGE_READONLY 360 0`02d12000 ( 45.070 MB) 0.11% 0.00%

PAGE_WRITECOPY 175 0`0102e000 ( 16.180 MB) 0.04% 0.00%

PAGE_EXECUTE_READWRITE 79 0`005d8000 ( 5.844 MB) 0.01% 0.00%

PAGE_READWRITE|PAGE_GUARD 284 0`00582000 ( 5.508 MB) 0.01% 0.00%

PAGE_EXECUTE_WRITECOPY 36 0`0016f000 ( 1.434 MB) 0.00% 0.00%

PAGE_NOACCESS 11 0`0000b000 ( 44.000 kB) 0.00% 0.00%

PAGE_EXECUTE 2 0`00003000 ( 12.000 kB) 0.00% 0.00%

那么,很值得看一下这些.net托管对象在内存中的行为。

我们看到,客户自己命名空间下的对象(ShowHand,已脱敏),有些已经达到了几百万的数量:

0:113> !dumpheap -stat

...

00007ffb6a1fab68 1062320 95485536 ShowHand.ConfigData.CommonPropertyInfo[]

00007ffb6ac2c4c0 1310734 104858720 ShowHand.ProjectU.Common.LBStaticActorProcessingAchievement

00007ffb6ac20db0 1633682 109318576 ShowHand.ProjectU.Common.IPropertiesProvider[]

00007ffb6ac0ad88 253100 114645888 ShowHand.ProjectU.Common.ICommonActorComp[]

00007ffbc5b16948 1809532 116227168 System.String

00007ffb6a5a84a8 1076959 120619408 ShowHand.ProjectU.Common.BattleGrid

00007ffb6b04bf40 2549394 122370912 behaviac.Action+ActionTask

00007ffb6a1f8a50 5651702 135640848 ShowHand.ConfigData.CommonPropertyInfo

00007ffb6b04ae10 1986359 143017848 behaviac.Selector+SelectorTask

00007ffb6abe65f8 31814 151783472 ShowHand.ProjectU.Common.IGameEventPipeListener[]

00007ffb6b04bdc8 3176057 152450736 behaviac.Assignment+AssignmentTask

00007ffb6ac03478 14993 155447424 System.Collections.Generic.Dictionary`2+Entry[[System.Int32, mscorlib],[ShowHand.ProjectU.Common.WayPointInfo, CommonDefine]][]

00007ffb6b0ba198 3976124 159044960 ShowHand.ProjectU.Common.BattleGrid+GridLink

00007ffb6afe0ef8 228402 164720576 ShowHand.ProjectU.Common.BattleGrid[]

00007ffb6afe0698 344120 185049680 ShowHand.ProjectU.Common.BattleGridInfo4Select[]

00007ffb6abda5f0 4078096 195748608 ShowHand.ProjectU.Common.WayPointInfo

00007ffb6b04bac0 4306198 206697504 behaviac.Condition+ConditionTask

00007ffb6b04bc88 2827987 226238960 behaviac.ReferencedBehavior+ReferencedBehaviorTask

00007ffb6aba7b88 5975977 239039080 ShowHand.ProjectU.Common.ProcessingMissionInfo

00007ffb6abebab0 6016203 240648120 ShowHand.ProjectU.Common.GameEventIdDefine[]

00007ffb6abe98e0 6050943 242037720 System.Collections.Generic.List`1[[ShowHand.ProjectU.Common.GameEventIdDefine, CommonDefine]]

00007ffb6afe0c78 515019 275971336 ShowHand.ProjectU.Common.BattleGridInfo4Attack[]

...

 

那么,这里是否存在托管内存泄露的问题,就值得深究一下了。

通过查看32个gc堆,可以看到这些堆中确实十分不健康:

0:113> !eeheap -gc

Number of GC Heaps: 32

------------------------------

Heap 0 (000000e275cd3290)

generation 0 starts at 0x000000e29cfa59a8

generation 1 starts at 0x000000e29b152070

generation 2 starts at 0x000000e277b31000

ephemeral segment allocation context: none

segment begin allocated size

000000e277b30000 000000e277b31000 000000e29d2959c0 0x257649c0(628509120)

Large object heap starts at 0x000000ea77b31000

segment begin allocated size

000000ea77b30000 000000ea77b31000 000000ea7811ccc8 0x5ebcc8(6208712)

Heap Size: Size: 0x25d50688 (634717832) bytes.

------------------------------

Heap 1 (000000e275cd6620)

generation 0 starts at 0x000000e2d363e938

generation 1 starts at 0x000000e2d182a800

generation 2 starts at 0x000000e2b7b31000

ephemeral segment allocation context: none

segment begin allocated size

000000e2b7b30000 000000e2b7b31000 000000e2d5478678 0x1d947678(496268920)

Large object heap starts at 0x000000ea87b31000

segment begin allocated size

000000ea87b30000 000000ea87b31000 000000ea87fa0e68 0x46fe68(4652648)

Heap Size: Size: 0x1ddb74e0 (500921568) bytes.

------------------------------

Heap 2 (000000e275cda830)

generation 0 starts at 0x000000e31f5516c8

generation 1 starts at 0x000000e31d26b3b8

generation 2 starts at 0x000000e2f7b31000

ephemeral segment allocation context: none

segment begin allocated size

000000e2f7b30000 000000e2f7b31000 000000e321b9c7b0 0x2a06b7b0(705083312)

Large object heap starts at 0x000000ea97b31000

segment begin allocated size

000000ea97b30000 000000ea97b31000 000000ea98130d60 0x5ffd60(6290784)

Heap Size: Size: 0x2a66b510 (711374096) bytes.

------------------------------

Heap 3 (000000e275cdf480)

generation 0 starts at 0x000000e35263b398

generation 1 starts at 0x000000e350719af8

generation 2 starts at 0x000000e337b31000

ephemeral segment allocation context: none

segment begin allocated size

000000e337b30000 000000e337b31000 000000e3529b73b0 0x1ae863b0(451437488)

Large object heap starts at 0x000000eaa7b31000

segment begin allocated size

000000eaa7b30000 000000eaa7b31000 000000eaa7f90b30 0x45fb30(4586288)

Heap Size: Size: 0x1b2e5ee0 (456023776) bytes.

------------------------------

 

以下29个heap数据略,以gc heap0 为例:

其Gen2 中大小已经到了593M(593629296Bytes)。

0:059> ? 0x000000e29b152070-0x000000e277b31000

Evaluate expression: 593629296 = 00000000`23621070

 

然而0代和1代大小才几MB和几十MB。 这种gen0 gen1很小,gen2爆大的分布是很不正常的,说明可能存在有GC不掉的托管对象。

我们查看一下heap 0 gen2中的对象,看到一个堆中,客户命名空间下的对象多的也达20万之多。

0:059>!dumpheap -stat 0x000000e277b31000 0x000000e29b152070

00007ffb6a5a84a8 35146 3936352 ShowHand.ProjectU.Common.BattleGrid

00007ffb6b04ae10 62375 4491000 behaviac.Selector+SelectorTask

00007ffb6a1f8a50 190065 4561560 ShowHand.ConfigData.CommonPropertyInfo

00007ffb6b04bdc8 99780 4789440 behaviac.Assignment+AssignmentTask

00007ffb6b0ba198 129947 5197880 ShowHand.ProjectU.Common.BattleGrid+GridLink

00007ffb6ac03478 508 5266944 System.Collections.Generic.Dictionary`2+Entry[[System.Int32, mscorlib],[ShowHand.ProjectU.Common.WayPointInfo, CommonDefine]][]

00007ffb6afe0698 10768 5802880 ShowHand.ProjectU.Common.BattleGridInfo4Select[]

00007ffb6b04bac0 133954 6429792 behaviac.Condition+ConditionTask

00007ffb6abda5f0 138160 6631680 ShowHand.ProjectU.Common.WayPointInfo

00007ffb6b04bc88 87804 7024320 behaviac.ReferencedBehavior+ReferencedBehaviorTask

00007ffb6aba7b88 204523 8180920 ShowHand.ProjectU.Common.ProcessingMissionInfo

00007ffb6abebab0 205528 8221120 ShowHand.ProjectU.Common.GameEventIdDefine[]

00007ffb6abe98e0 207062 8282480 System.Collections.Generic.List`1[[ShowHand.ProjectU.Common.GameEventIdDefine, CommonDefine]]

00007ffb6afe0c78 16152 8657472 ShowHand.ProjectU.Common.BattleGridInfo4Attack[]

00007ffb6b04af38 222880 8915200 System.Collections.Generic.List`1[[behaviac.BehaviorTask, BehaviacRuntime]]

00007ffb6b0ea0c0 5248 9385248 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[FixMath.NET.Fix64, Fix64]][]

00007ffbc5aea7f0 121480 9718400 System.Collections.Generic.Dictionary`2[[System.Int32, mscorlib],[System.Int32, mscorlib]]

00007ffb6b04b730 147779 10640088 behaviac.Sequence+SequenceTask

 

我们已ShowHand.ProjectU.Common.ProcessingMissionInfo为例,随机挑选一些该类的对象,查看其root行为。

0:059>!dumpheap -mt 00007ffb6aba7b88 0x000000e277b31000 0x000000e29b152070

...

000000e27c723178 00007ffb6aba7b88 40

000000e27c723348 00007ffb6aba7b88 40

000000e27c723500 00007ffb6aba7b88 40

000000e27c7236b8 00007ffb6aba7b88 40

000000e27c723888 00007ffb6aba7b88 40

000000e27c723a40 00007ffb6aba7b88 40

000000e27c723bf8 00007ffb6aba7b88 40

000000e27c723dc8 00007ffb6aba7b88 40

000000e27c723f80 00007ffb6aba7b88 40

000000e27c724138 00007ffb6aba7b88 40

000000e27c724308 00007ffb6aba7b88 40

000000e27c7244c0 00007ffb6aba7b88 40

000000e27c724678 00007ffb6aba7b88 40

000000e27c724848 00007ffb6aba7b88 40

000000e27c724a00 00007ffb6aba7b88 40

000000e27c724bb8 00007ffb6aba7b88 40

000000e27c724d88 00007ffb6aba7b88 40

000000e27c724f40 00007ffb6aba7b88 40

000000e27c7250f8 00007ffb6aba7b88 40

000000e27c7252c8 00007ffb6aba7b88 40

000000e27c725480 00007ffb6aba7b88 40

000000e27c725638 00007ffb6aba7b88 40

000000e27c725808 00007ffb6aba7b88 40

000000e27c7259c0 00007ffb6aba7b88 40

000000e27c725b78 00007ffb6aba7b88 40

000000e27c725d48 00007ffb6aba7b88 40

000000e27c725f00 00007ffb6aba7b88 40

000000e27c7260b8 00007ffb6aba7b88 40

000000e27c726288 00007ffb6aba7b88 40

000000e27c726440 00007ffb6aba7b88 40

000000e27c7265f8 00007ffb6aba7b88 40

000000e27c7267c8 00007ffb6aba7b88 40

...

 

随机挑选000000e27c723348和000000e27c723f80 这两个对象,查看其引用链:

0:059> !gcroot 000000e27c723348

Thread 2f78:

000000e216b0efa0 00007ffb69ccac7b log4net.Appender.FileAppender.Append(log4net.Core.LoggingEvent)

rbp+10: 000000e216b0efe0

-> 000000e377b3f738 log4net.Appender.AsyncRollingFileAppender

-> 000000e377b3f850 System.Collections.Concurrent.ConcurrentQueue`1[[log4net.Core.LoggingEvent, log4net]]

-> 000000e623ca3f08 System.Collections.Concurrent.ConcurrentQueue`1+Segment[[log4net.Core.LoggingEvent, log4net]]

-> 000000e623ca3f48 log4net.Core.LoggingEvent[]

-> 000000e9555e8218 log4net.Core.LoggingEvent

-> 000000e377b3b310 log4net.Repository.Hierarchy.Hierarchy

-> 000000e377b527a0 log4net.Repository.LoggerRepositoryShutdownEventHandler

-> 000000e377b526c8 log4net.Core.WrapperMap

-> 000000e377b526f0 System.Collections.Hashtable

-> 000000e377b52740 System.Collections.Hashtable+bucket[]

-> 000000e377b527e0 System.Collections.Hashtable

-> 000000e377b52830 System.Collections.Hashtable+bucket[]

-> 000000e377b522e8 log4net.Repository.Hierarchy.DefaultLoggerFactory+LoggerImpl

-> 000000e377b3e720 log4net.Repository.Hierarchy.RootLogger

-> 000000e377b46980 log4net.Util.AppenderAttachedImpl

-> 000000e377b469a0 log4net.Appender.AppenderCollection

-> 000000e377b51130 log4net.Appender.IAppender[]

-> 000000e377b469c0 log4net.Appender.AsyncRollingFileAppender

-> 000000e377b46b70 System.Threading.Thread

-> 000000e6b7b33cb8 System.Runtime.Remoting.Contexts.Context

-> 000000e277b31560 System.AppDomain

-> 000000e6b7b67160 System.UnhandledExceptionEventHandler

-> 000000e277b32a60 ShowHand.ProjectU.GameServer.GameServer

-> 000000e377b53750 ShowHand.ServerBase.PlayerContextManager

-> 000000e377b53a48 System.Collections.Concurrent.ConcurrentDictionary`2[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]]

-> 000000e9c04a04d0 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]]

-> 000000ec47cb0218 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]][]

-> 000000e9c048e440 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]]

-> 000000e707424da8 ShowHand.ProjectU.GameServer.GameServerPlayerContext

-> 000000e707424cf0 ShowHand.NetSharp.Client

-> 000000e707425610 System.Threading.SemaphoreSlim

-> 000000e707425668 System.Threading.SemaphoreSlim+TaskNode

-> 000000e685d6a2e0 System.Collections.Generic.List`1[[System.Object, mscorlib]]

-> 000000e6c2a468d0 System.Object[]

-> 000000e2d362b088 System.Threading.Tasks.TaskFactory+CompleteOnInvokePromise

-> 000000e707425840 System.Action

-> 000000e707425820 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner

-> 000000e7074258d0 ShowHand.NetSharp.Client+d__21

-> 000000e707425880 System.Threading.Tasks.Task`1[[System.Threading.Tasks.VoidTaskResult, mscorlib]]

-> 000000e707425980 System.Action

-> 000000e707425960 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner

-> 000000e7074259c0 ShowHand.NetSharp.Endpoint+d__8

-> 000000e2b7b495f8 ShowHand.NetSharp.Endpoint

-> 000000e2b7b49630 System.Collections.Concurrent.ConcurrentDictionary`2[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]

-> 000000e5bccc1f00 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]

-> 000000eb47bd1150 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]][]

-> 000000e5bccbbee8 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]

-> 000000e43c0e1d48 ShowHand.NetSharp.Client

-> 000000e43c0e2cb8 ShowHand.ProjectU.GameServer.GameServerPlayerContext

-> 000000e43c0e2e50 System.Collections.Generic.List`1[[ShowHand.ProjectU.Common.Components.Server.IServerDataSection, ProjectU.LogicComp]]

-> 000000e27c6b4f48 ShowHand.ProjectU.Common.Components.Server.IServerDataSection[]

-> 000000e27c6a12f8 ShowHand.ProjectU.Common.Components.Server.MissionDataSection

-> 000000e27c6a1330 System.Collections.Generic.List`1[[ShowHand.ProjectU.Common.ProcessingMissionInfo, CommonDefine]]

-> 000000e27c71d400 ShowHand.ProjectU.Common.ProcessingMissionInfo[]

-> 000000e27c723348 ShowHand.ProjectU.Common.ProcessingMissionInfo

 

我们可以看到,000000e27c723348间接引用自ShowHand.NetSharp.Client,我们看到000000e27c723f80和其他所有gen2中的ShowHand.ProjectU.Common.ProcessingMissionInfo都间接引用自ShowHand.NetSharp.Client,并且都引用来自同一个ShowHand.NetSharp.Endpoint对象000000e2b7b495f8。 我们进而分析这个对象。

0:059> !do 000000e2b7b495f8

Name: ShowHand.NetSharp.Endpoint

MethodTable: 00007ffb69f641d0

EEClass: 00007ffb69f53fd8

Size: 56(0x38) bytes

File: e:\ServerRelease\Server\GameServer\NetSharp.dll

Fields:

MT Field Offset Type VT Attr Value Name

00007ffbc5b19288 400002d 28 System.Int32 1 instance 65536 k__BackingField

00007ffb698a8d80 400002e 8 ...pointEventHandler 0 instance 000000e277b32a60 m_endPointEventHandler

00007ffbc4bf2088 400002f 10 ...ckets.TcpListener 0 instance 000000e2b7b4ad60 m_listener

00007ffb69f64078 4000030 2c System.Int32 1 instance 0 m_state

00007ffbc5b27b38 4000031 18 ...eading.Tasks.Task 0 instance 000000e2b7b4af70 m_mainTask

00007ffb69f64888 4000032 20 ...olean, mscorlib]] 0 instance 000000e2b7b49630 m_clientActiveness

 

0:059> !do 000000e2b7b49630

Name: System.Collections.Concurrent.ConcurrentDictionary`2[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]

MethodTable: 00007ffb69f64888

EEClass: 00007ffb69e90240

Size: 64(0x40) bytes

File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll

Fields:

MT Field Offset Type VT Attr Value Name

00007ffb69d33f48 4001830 8 ...olean, mscorlib]] 0 instance 000000e5bccc1f00 m_tables

00007ffbc5b34300 4001831 10 ...Canon, mscorlib]] 0 instance 0000000000000000 m_comparer

00007ffbc5b21f28 4001832 30 System.Boolean 1 instance 1 m_growLockArray

00007ffbc5b19288 4001833 20 System.Int32 1 instance 0 m_keyRehashCount

00007ffbc5b19288 4001834 24 System.Int32 1 instance 32 m_budget

00007ffbc65fc070 4001835 18 ...ean, mscorlib]][] 0 instance 0000000000000000 m_serializationArray

00007ffbc5b19288 4001836 28 System.Int32 1 instance 0 m_serializationConcurrencyLevel

00007ffbc5b19288 4001837 2c System.Int32 1 instance 0 m_serializationCapacity

00007ffbc5b21f28 400183b 10 System.Boolean 1 static

 

0:059> !do 000000e5bccc1f00

Name: System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]

MethodTable: 00007ffb69f65790

EEClass: 00007ffb69e90968

Size: 48(0x30) bytes

File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll

Fields:

MT Field Offset Type VT Attr Value Name

0000000000000000 400341d 8 SZARRAY 0 instance 000000eb47bd1150 m_buckets

00007ffbc5b16fc0 400341e 10 System.Object[] 0 instance 000000e737ef1220 m_locks

00007ffbc5b19220 400341f 18 System.Int32[] 0 instance 000000e5bcc76a00 m_countPerLock

00007ffbc5b34300 4003420 20 ...Canon, mscorlib]] 0 instance 000000e2b7b41d18 m_comparer

 

0:059> !do 000000eb47bd1150

Name: System.Collections.Concurrent.ConcurrentDictionary`2+Node[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]][]

MethodTable: 00007ffb69f65610

EEClass: 00007ffbc54daa00

Size: 266240(0x41000) bytes

Array: Rank 1, Number of elements 33277, Type CLASS (Print Array)

Fields:

None

 

我们可以看到,某个ShowHand.NetSharp.Endpoint对象,引用了数万的ShowHand.NetSharp.Client,而数万的

ShowHand.NetSharp.Client又间接引用了几千万的客户自己的各种对象。 最终这些对象因为存在着引用,经历GC回收后最终被推到了Gen2。

 

这是一个不健康的行为,比起gen0和gen1的垃圾回收,gen2的回收则昂贵的多。 基于严谨,我们不能将上述分析作为确凿证据和文初的cpu抖动挂钩(其实的确是又这种可能性,大规模的gen2 GC引发高CPU)。但是,该不健康的2代对象太多的问题,的确需要解决,无论它和CPU抖动有没有直接关系,它都会给程序的健康运行带来巨大隐患。

 

我们建议客户基于以上分析,并基于自身业务考虑该情况的发生是否合理,如不合理,应适当考虑对程序进行优化。

 

我们是阿里云智能全球技术服务-SRE团队,我们致力成为一个以技术为基础、面向服务、保障业务系统高可用的工程师团队;提供专业、体系化的SRE服务,帮助广大客户更好地使用云、基于云构建更加稳定可靠的业务系统,提升业务稳定性。我们期望能够分享更多帮助企业客户上云、用好云,让客户云上业务运行更加稳定可靠的技术,您可用钉钉扫描下方二维码,加入阿里云SRE技术学院钉钉圈子,和更多云上人交流关于云平台的那些事。

一次.net托管内存泄露分析

上一篇:OkHttp3几个简单的例子和在子线程更新UI线程的方法


下一篇:专有云传统HSF升级Pandora Boot开发