https://www.tessferrandez.com/blog/2008/02/11/hang-caused-by-gc-xml-deadlock.html
Tess Ferrandez
Principal Software Engineer at Microsoft. Solving the problems of the world with a debugger, a sprinkle of ML and a bag of chips
Hang caused by GC - XML Deadlock
In December I blogged about a little tool that I wrote to analyze hangs in dumps, and i showed the following output, but didn’t really get into the details of why the process was stuck in here.
____________________________________________________________________________________________________________________
GC RELATED INFORMATION
____________________________________________________________________________________________________________________
The following threads are GC threads:
18 19
The following threads are waiting for the GC to finish:
14 16 24 26 27 28 30 31 36 37 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 57 58 59 60 62 63 64 65 66 67 68 69 70 71 72 73 74 77 78
The GC was triggered by thread: 75
The GC is working on suspending threads to continue with garbage collection
The following threads can't be suspended because preemptive GC is disabled:
23 25 33 34 35 38 56 61
The Finalizer (Thread 20) is not blocked
The issue the customer is running into is a hang during heavy load. The only way to get out of the hang is to recycle the process (IISReset).
Debugging the issuePermalink
I have seen this issue before on a few occasions and although as you will see later it was currently fixed in the framework, but in my earlier cases on this we ended up not needing a fix since the customers I worked with made code changes that made it such that they no longer were subject to the issue.
So what is going on here?
Thread 75 triggered a garbage collection by making an allocation that would have made Gen 0 go over its allocation budget.
0:075> kb 2000
ChildEBP RetAddr Args to Child
1124de6c 7c822124 77e6bad8 000002e8 00000000 ntdll!KiFastSystemCallRet
1124de70 77e6bad8 000002e8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
1124dee0 79e718fd 000002e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
1124df24 79e718c6 000002e8 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
1124df74 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
1124df84 7a0d0d0f ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
1124dfa8 7a0d5289 ffffffff 000d4558 106cb970 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99
1124dfcc 7a0d5fa2 00000000 00000000 00000020 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x267
1124e058 7a0d691f 106cb970 00000020 00000000 mscorwks!SVR::gc_heap::try_allocate_more_space+0x1c0
1124e078 7a0d7ecc 106cb970 00000020 00000000 mscorwks!SVR::gc_heap::allocate_more_space+0x2f
1124e098 7a08bd32 106cb970 00000020 00000002 mscorwks!SVR::GCHeap::Alloc+0x74
1124e0b4 79e7b43e 00000020 00000000 00080000 mscorwks!Alloc+0x60
1124e180 79e8f41c 79157f42 1124e230 00000001 mscorwks!AllocateArrayEx+0x1d1
1124e244 7937f5c2 064f60b8 064f60b8 064f60b8 mscorwks!JIT_NewArr1+0x167
1124e27c 5088a509 00000000 00000000 00000000 mscorlib_ni!System.Reflection.RuntimeMethodInfo.GetParameters()+0x4a
...
Nothing strange there, allocation and garbage collection happens all the time, however a GC is usually extremely fast and in this case we have 45 threads waiting for the GC to finish.
0:014> kb
ChildEBP RetAddr Args to Child
01a0fc74 7c822124 77e6bad8 000002e4 00000000 ntdll!KiFastSystemCallRet
01a0fc78 77e6bad8 000002e4 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01a0fce8 79e718fd 000002e4 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
01a0fd2c 79e718c6 000002e4 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
01a0fd7c 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01a0fd8c 7a0851cb ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01a0fd9c 79f40e96 00000000 13d43bb8 0e874858 mscorwks!SVR::GCHeap::WaitUntilGCComplete+0x32
01a0fdd8 79e7385b 00000001 7a0e607b 00000001 mscorwks!Thread::RareDisablePreemptiveGC+0x1a1
01a0fde0 7a0e607b 00000001 13d43838 00000102 mscorwks!GCHolder<1,0,0>::GCHolder<1,0,0>+0x2d
01a0fe2c 7a0e673e 00000000 7a393704 7a114dea mscorwks!Thread::OnThreadTerminate+0x53
01a0fe38 7a114dea 0e874858 13d4385c 00000000 mscorwks!DestroyThread+0x43
01a0fe94 79f79c4f 00000000 00000000 00000000 mscorwks!ThreadpoolMgr::CompletionPortThreadStart+0x33d
01a0ffb8 77e6608b 000c4c00 00000000 00000000 mscorwks!ThreadpoolMgr::intermediateThreadProc+0x49
01a0ffec 00000000 79f79c09 000c4c00 00000000 kernel32!BaseThreadStart+0x34
so for some reason the GC appears to be taking some time.
There are 2 GC threads (one per logical processor), thread 18 and 19 in this case… Thread 19 is simply waiting for work
0:019> kb
ChildEBP RetAddr Args to Child
01e2fd68 7c822124 77e6bad8 000002d0 00000000 ntdll!KiFastSystemCallRet
01e2fd6c 77e6bad8 000002d0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01e2fddc 79e718fd 000002d0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
01e2fe20 79e718c6 000002d0 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
01e2fe70 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01e2fe80 7a0d8898 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01e2fea8 7a0d8987 01e2ff00 77e60eb5 01e2fec8 mscorwks!SVR::gc_heap::gc_thread_function+0x58
01e2ffb8 77e6608b 000d5050 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
01e2ffec 00000000 7a0d88eb 000d5050 00000000 kernel32!BaseThreadStart+0x34
But interestingly enough thread 18 is waiting to suspend all managed threads in order to continue the GC (to avoid for anyone to allocate any more data while it is performing the GC)
0:018> kb
ChildEBP RetAddr Args to Child
01defb10 7c821524 77e98ef4 00000f64 01defb34 ntdll!KiFastSystemCallRet
01defb14 77e98ef4 00000f64 01defb34 01defe04 ntdll!NtGetContextThread+0xc
01defb24 7a0de046 00000f64 01defb34 00010002 kernel32!GetThreadContext+0x11
01defe04 7a0defc1 00000f64 1069d328 13aa3858 mscorwks!EnsureThreadIsSuspended+0x3f
01defe4c 7a0e290a 00000000 00000000 13aa3874 mscorwks!Thread::SuspendThread+0xd0
01defe9c 7a086e76 00000000 13aa399c 00000000 mscorwks!Thread::SysSuspendForGC+0x5a6
01deff88 7a0d867b 00000001 00000000 000d4368 mscorwks!SVR::GCHeap::SuspendEE+0x16c
01deffa8 7a0d8987 00000000 13aa39ac 01deffec mscorwks!SVR::gc_heap::gc_thread_function+0x3b
01deffb8 77e6608b 000d4368 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
01deffec 00000000 7a0d88eb 000d4368 00000000 kernel32!BaseThreadStart+0x34
Normally suspending all managed threads happens in a matter of nanoseconds, and pretty much the only thing that could cause the process to be stuck while suspending is if some thread has disabled preemptive GC, i.e. told the GC that it is in a state where it can’t be disturbed…
Yun Jin describes PreemptiveGC like this in one of his posts
Preemptive GC: also very important. In Rotor, this is m_fPreemptiveGCDisabled field of C++ Thread class. It indicates what GC mode the thread is in: “enabled” in the table means the thread is in preemptive mode where GC could preempt this thread at any time; “disabled” means the thread is in cooperative mode where GC has to wait the thread to give up its current work (the work is related to GC objects so it can’t allow GC to move the objects around). When the thread is executing managed code (the current IP is in managed code), it is always in cooperative mode; when the thread is in Execution Engine (unmanaged code), EE code could choose to stay in either mode and could switch mode at any time; when a thread are outside of CLR (e.g, calling into native code using interop), it is always in preemptive mode.
In our case, as we can see from the output from my tool the following threads have preemptive GC disabled (which is very uncommon)
23 25 33 34 35 38 56 61
In the !threads output it looks like this (notice the PreEmptive GC column)
0:018> !threads
ThreadCount: 59
UnstartedThread: 0
BackgroundThread: 59
PendingThread: 0
DeadThread: 0
Hosted Runtime: yes
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
16 1 1008 000d0828 1808220 Enabled 06b79310:06b79db0 000f0728 1 MTA (Threadpool Worker)
20 2 1e7c 000d62d0 b220 Enabled 00000000:00000000 000cd190 0 MTA (Finalizer)
21 3 1d3c 000db120 1220 Enabled 00000000:00000000 000cd190 0 Ukn
22 4 1c38 000ed3c8 80a220 Enabled 00000000:00000000 000cd190 0 MTA (Threadpool Completion Port)
23 5 19ac 0013a520 180b222 Disabled 06c17754:06c18bd8 00158e70 2 MTA (Threadpool Worker)
24 6 1fd4 0014d1a0 b220 Enabled 00000000:00000000 000f0728 1 MTA
25 7 1568 0e81aae0 180b222 Disabled 02cea570:02cec4a8 00158e70 2 MTA (Threadpool Worker)
26 8 1ad0 0e82bd58 b220 Enabled 00000000:00000000 00158e70 0 MTA
27 9 1f04 0e829310 b220 Enabled 032b0cb8:032b2938 00158e70 0 MTA
28 a ffc 0e820d18 b220 Enabled 070173e4:070193c0 00158e70 1 MTA
14 b 1ee8 0e874858 1800220 Enabled 02b34750:02b34c78 000cd190 0 Ukn (Threadpool Worker)
30 d 1080 0e8aff40 b220 Enabled 00000000:00000000 0e87c310 0 MTA
31 e 11ec 0e8d1f28 8801220 Enabled 02b418e4:02b42c78 000cd190 0 MTA (Threadpool Completion Port)
33 f 1f1c 0e8e6420 180b222 Disabled 06bc1fa0:06bc3dcc 00158e70 2 MTA (Threadpool Worker)
34 10 1a6c 0e8e8b20 180b222 Disabled 02c449b8:02c45d74 00158e70 2 MTA (Threadpool Worker)
35 11 710 0e8ee550 180b222 Disabled 06cdde68:06cdea48 00158e70 2 MTA (Threadpool Worker)
36 c 1ae0 0e875228 180b220 Enabled 072bd88c:072bf508 000cd190 0 MTA (Threadpool Worker)
37 12 19c8 00147290 180b220 Enabled 06bb4b70:06bb5dcc 000f0728 1 MTA (Threadpool Worker)
38 13 18f0 0e8901d8 180b222 Disabled 02c49ce8:02c49d74 00158e70 2 MTA (Threadpool Worker)
39 14 19b8 0e8f7338 180b220 Enabled 073c6fbc:073c6fe8 00158e70 1 MTA (Threadpool Worker)
40 15 1308 0e8f8610 180b220 Enabled 02af10a4:02af10ac 00158e70 3 MTA (Threadpool Worker)
41 16 1f38 0e8f99e8 180b220 Enabled 070c7154:070c761c 000cd190 0 MTA (Threadpool Worker)
42 17 1be0 0e8facc0 180b220 Enabled 06ad3a74:06ad4cc8 000cd190 0 MTA (Threadpool Worker)
43 18 1efc 0e8fbf98 180b220 Enabled 0321b11c:0321c938 000cd190 0 MTA (Threadpool Worker)
44 19 1470 0e8fd160 1801220 Enabled 06ef6770:06ef7178 000cd190 0 MTA (Threadpool Worker)
45 1a 150 0e8fe438 180b220 Enabled 02a15a34:02a161bc 000cd190 0 MTA (Threadpool Worker)
46 1b 1b10 0e8ff710 180b220 Enabled 06b216b8:06b22d00 000f0728 1 MTA (Threadpool Worker)
47 1c 1c8c 10690ac0 180b220 Enabled 02be2430:02be3b00 000f0728 1 MTA (Threadpool Worker)
48 1d 16c8 10691ad8 180b220 Enabled 06b31520:06b3189c 000f0728 1 MTA (Threadpool Worker)
49 1e 1a5c 10692e20 180b220 Enabled 02ab3a48:02ab50ac 000f0728 1 MTA (Threadpool Worker)
50 1f 1908 10694168 180b220 Enabled 06b686fc:06b69db0 000f0728 1 MTA (Threadpool Worker)
51 20 284 106954d0 180b220 Enabled 0319d1e4:0319e900 000cd190 0 MTA (Threadpool Worker)
52 21 1d74 10696708 180b220 Enabled 06bac634:06baddcc 000f0728 1 MTA (Threadpool Worker)
53 22 58c 106979c8 180b220 Enabled 02c14784:02c15d58 000f0728 1 MTA (Threadpool Worker)
54 23 1860 10698d10 180b220 Enabled 072e1984:072e3508 00158e70 1 MTA (Threadpool Worker)
55 24 1c9c 1069ae00 180b220 Enabled 071a4784:071a5508 000cd190 0 MTA (Threadpool Worker)
56 25 15c4 1069d328 180b222 Disabled 06bc7174:06bc7dcc 00158e70 2 MTA (Threadpool Worker)
57 26 1968 106a09e0 180b220 Enabled 0321da48:0321e938 000cd190 0 MTA (Threadpool Worker)
58 27 1e10 106a3608 180b220 Enabled 070172c8:070173c0 000cd190 0 MTA (Threadpool Worker)
59 28 1a18 106a6418 180b220 Enabled 02ac036c:02ac10ac 000f0728 1 MTA (Threadpool Worker)
60 29 1d2c 106a9148 180b220 Enabled 06a9b5a4:06a9cc90 000cd190 0 MTA (Threadpool Worker)
61 2a 1b50 106ac0a8 180b222 Disabled 06cdaf94:06cdbda8 00158e70 2 MTA (Threadpool Worker)
62 2b 96c 106ae8d0 8801220 Enabled 02c00980:02c01d3c 000cd190 0 MTA (Threadpool Completion Port)
63 2c 1b18 106afa50 180b220 Enabled 0322214c:03222938 000cd190 0 MTA (Threadpool Worker)
64 2d 1d78 106b2d48 180b220 Enabled 06f26e78:06f26eb4 000cd190 0 MTA (Threadpool Worker)
65 2e 198c 106b5cb8 180b220 Enabled 06ed8418:06ed8c98 000cd190 0 MTA (Threadpool Worker)
66 2f 1fcc 106b8b28 180b220 Enabled 0728d400:0728d508 00158e70 1 MTA (Threadpool Worker)
67 30 1958 106bb998 180b220 Enabled 0304e7e8:03050420 000cd190 0 MTA (Threadpool Worker)
68 31 1b48 106beb58 180b220 Enabled 031a5b98:031a691c 000cd190 0 MTA (Threadpool Worker)
69 32 1d80 106c19c8 180b220 Enabled 06b13444:06b14d00 000f0728 1 MTA (Threadpool Worker)
70 33 17cc 106c4838 180b220 Enabled 0700d468:0700f3a4 000cd190 0 MTA (Threadpool Worker)
71 34 1424 106c8890 180b220 Enabled 06b6f8bc:06b6fdb0 000f0728 1 MTA (Threadpool Worker)
72 35 1e00 106bd870 180b220 Enabled 06b8dcbc:06b8ddb0 000f0728 1 MTA (Threadpool Worker)
73 36 1d08 106c95e0 180b220 Enabled 071a081c:071a1508 000cd190 0 MTA (Threadpool Worker)
74 37 1fdc 106ca680 180b220 Enabled 02c05810:02c05d3c 000cd190 0 MTA (Threadpool Worker)
75 38 1b8c 106cb930 180b220 Enabled 0335f888:0335f88c 00158e70 2 MTA (Threadpool Worker)
76 39 1928 106cc900 880b220 Enabled 072bc2f0:072bd508 000cd190 0 MTA (Threadpool Completion Port)
77 3a 1fd0 106d7270 8801220 Enabled 02c18ca4:02c19d58 000cd190 0 MTA (Threadpool Completion Port)
78 3b 1640 106d7908 180b220 Enabled 0294474c:02945a7c 000cd190 0 MTA (Threadpool Worker)
Why have these threads disabled preemptive GC, not allowing the GC to suspend them, and ultimately blocking our process?
All the blocked threads are sitting in this type of call stack, trying to enter a lock (JIT_MonTryEnter
). Normally you would see a thread either owning the lock or waiting in an aware lock like in this post, but here it is just spinning trying to enter the lock.
0:056> kb 2000
ChildEBP RetAddr Args to Child
10b0f4d8 0eb65afc 06bc5448 06bc53f8 22a9c796 mscorwks!JIT_MonTryEnter+0xad
WARNING: Frame IP not in any known module. Following frames may be wrong.
10b0f504 69918f30 10b0f560 06bc5448 06bc5448 0xeb65afc
00000000 00000000 00000000 00000000 00000000 System_Web_Services_ni+0x28f30
unfortunately the managed stack is not very helpful in telling us where the lock was taken
0:056> !clrstack
OS Thread Id: 0x15c4 (56)
ESP EIP
10b0f8f8 79e73eac [ContextTransitionFrame: 10b0f8f8]
10b0f948 79e73eac [GCFrame: 10b0f948]
10b0faa0 79e73eac [ComMethodFrame: 10b0faa0]
and if we run !syncblk
we have no active sync blocks, so no help there when it comes to finding out who owns this lock.
0:056> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
-----------------------------
Total 165
CCW 4
RCW 1
ComClassFactory 0
Free 8
seems like we are stuck between a rock and a hard place here.
I have shown the command !dumpstack
before. It shows a mixture of a managed and native call stack but it is a raw stack, meaning that it will pretty much just display any addresses on the stack that happen to be pointing to code. This means that !dumpstack
will not give a true stack trace (i.e. like !clrstack and kb where if functionA calls functionB we will see functionA directly below functionB in the stack) and anything shown by !dumpstack
may or may not be correct. Anyways, with that little warning, here comes !dumpstack
:)
0:056> !dumpstack
OS Thread Id: 0x15c4 (56)
Current frame: mscorwks!JIT_MonTryEnter+0xad
ChildEBP RetAddr Caller,Callee
10b0f47c 638c7552 (MethodDesc 0x63a539a0 +0x62 System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet)),
calling mscorwks!JIT_MonTryEnter
10b0f480 793463bb (MethodDesc 0x7923bbc8 +0xdb System.Collections.Hashtable..ctor(Int32, Single)),
calling mscorwks!JIT_Dbl2IntSSE2
10b0f498 638b36f6 (MethodDesc 0x63a505f0 +0xa6 System.Xml.Schema.SchemaInfo..ctor()),
calling (MethodDesc 0x7923bbc8 +0 System.Collections.Hashtable..ctor(Int32, Single))
10b0f49c 638b3719 (MethodDesc 0x63a505f0 +0xc9 System.Xml.Schema.SchemaInfo..ctor()),
calling mscorwks!JIT_Writeable_Thunks_Buf
10b0f4d8 0eb65afc (MethodDesc 0xeb756f0 +0x5c CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage(System.Web.Services.Protocols.SoapMessage)),
calling (MethodDesc 0x63a539a0 +0 System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet))
10b0f504 69918f30 (MethodDesc 0x699c4798 +0x3c System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean))
10b0f518 699227b3 (MethodDesc 0x699c5250 +0x2cb System.Web.Services.Protocols.SoapServerProtocol.Initialize()),
calling (MethodDesc 0x699c4798 +0 System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean))
...
!dumpstack
is a bit tricky to deal with, but in the stack above if we start from the bottom of the part of the stack that I’ve shown we have SoapServerProtocol.Initialize()
which calls SoapMessage.RunExtensions
.
This we can trust because !dumpstack
is actually telling us that Initialize
is calling RunExtensions
in this case (assuming that we believe that Initialize
was called).
RunExtensions
doesn’t tell us what it is calling but that is because it is calling into a custom component CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage
.
This is in turn calling into XmlSchemaSet.Add
which is calling into JIT_MonTryEnter
. So long story short, the real stack trace should look like this
mscorwks!JIT_MonTryEnter+0xad
System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet)
CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage(System.Web.Services.Protocols.SoapMessage)System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean)
System.Web.Services.Protocols.SoapServerProtocol.Initialize()
...
!sos.clrstack
just wasn’t able to rebuild it because of the state it was in
Looking through the output from ~* e !clrstack
I find one stack that is currently in XmlSchemaSet.Add
and thus is probably the one holding the lock we are trying to enter here
OS Thread Id: 0x1308 (40)
ESP EIP
104ce878 7c82ed54 [HelperMethodFrame: 104ce878]
104ce8d0 638b46bf System.Xml.Schema.SchemaNames..ctor(System.Xml.XmlNameTable)
104cebd4 638ca6fb System.Xml.Schema.XmlSchemaSet.GetSchemaNames(System.Xml.XmlNameTable)
104cebe0 638c96fc System.Xml.Schema.XmlSchemaSet.PreprocessSchema(System.Xml.Schema.XmlSchema ByRef, System.String)
104cebf8 638c86b7 System.Xml.Schema.XmlSchemaSet.Add(System.String, System.Xml.Schema.XmlSchema)
104cec04 638c7667 System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet)
104cec60 0eb65afc CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage(System.Web.Services.Protocols.SoapMessage)
104cec8c 69918f30 System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean)
104ceca4 699227b3 System.Web.Services.Protocols.SoapServerProtocol.Initialize()
104cece8 6990d904 System.Web.Services.Protocols.ServerProtocolFactory.Create(System.Type, System.Web.HttpContext, System.Web.HttpRequest, System.Web.HttpResponse, Boolean ByRef)
104ced28 699263ab System.Web.Services.Protocols.WebServiceHandlerFactory.CoreGetHandler(System.Type, System.Web.HttpContext, System.Web.HttpRequest, System.Web.HttpResponse)
104ced64 69926329 System.Web.Services.Protocols.WebServiceHandlerFactory.GetHandler(System.Web.HttpContext, System.String, System.String, System.String)
104ced88 65fc057c System.Web.HttpApplication.MapHttpHandler(System.Web.HttpContext, System.String, System.Web.VirtualPath, System.String, Boolean)
104cedcc 65fd58cd System.Web.HttpApplication+MapHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
104ceddc 65fc1610 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
104cee1c 65fd32e0 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
104cee6c 65fc0225 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
104cee88 65fc550b System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
104ceebc 65fc5212 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
104ceec8 65fc3587 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
104cf078 79f35ee8 [ContextTransitionFrame: 104cf078]
104cf0c8 79f35ee8 [GCFrame: 104cf0c8]
104cf220 79f35ee8 [ComMethodFrame: 104cf220]
0:040> kb
ChildEBP RetAddr Args to Child
104ce5d4 7c822124 77e6bad8 000002e8 00000000 ntdll!KiFastSystemCallRet
104ce5d8 77e6bad8 000002e8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
104ce648 79e718fd 000002e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
104ce68c 79e718c6 000002e8 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
104ce6dc 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
104ce6ec 7a0d0d0f ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
104ce710 7a0d5dfa ffffffff 00001037 000d4368 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99
104ce788 7a0d691f 0e8f8650 00000014 00000000 mscorwks!SVR::gc_heap::try_allocate_more_space+0x17
104ce7a8 7a0d7ecc 0e8f8650 00000014 00000000 mscorwks!SVR::gc_heap::allocate_more_space+0x2f
104ce7c8 7a08bd32 0e8f8650 00000014 00000002 mscorwks!SVR::GCHeap::Alloc+0x74
104ce7e4 79e754ff 00000014 00000000 00080000 mscorwks!Alloc+0x60
104ce824 79e755c1 639f59e0 02382edc 02af0090 mscorwks!FastAllocateObject+0x38
104ce8c8 638b46bf 02af0e60 00000000 00000000 mscorwks!JIT_NewFast+0x9e
104ce8cc 02af0e60 00000000 00000000 00000000 System_Xml_ni+0x1146bf
WARNING: Frame IP not in any known module. Following frames may be wrong.
104ce8d0 00000000 00000000 00000000 00000000 0x2af0e60
But unfortunately it can’t give up this lock and finish what it is doing because it is waiting for the GC to finish, so effectively we are in a deadlock.
Thread 40 holds a lock but to release it it needs the GC to complete. The GC can’t complete because it can’t suspend thread 56 (and other threads) that have preemptive GC disabled. Thread 56 can’t enable preemptive GC until it gets the lock owned by thread 40.
I should add, that the only times I have seen this, it has been in the lock in XmlSchemaSet.Add
, and under heavy load when multiple threads were trying to access the same XmlSchemaSet.
SolutionPermalink
So what can we do about this? Well, a hotfix just came out (KB946644) that will fix this problem so if you are running into this issue you can call into support and ask for that hotfix.
In my earlier cases, the customers have locked around the XmlSchemaSet
in the custom component since according to the msdn documentation any instance members of XmlSchemaSet
are not guaranteed to be thread safe. This resolved the issue.
Laters,
Tess
Categories: blog
Updated: February 11, 2008