13 minute read

In December I blogged about a little tool that I wrote to analyze hangs in dumps, and i showed the following output, but didn’t really get into the details of why the process was stuck in here.

____________________________________________________________________________________________________________________

 GC RELATED INFORMATION
____________________________________________________________________________________________________________________

The following threads are GC threads:
  18 19

The following threads are waiting for the GC to finish:
  14 16 24 26 27 28 30 31 36 37 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 57 58 59 60 62 63 64 65 66 67 68 69 70 71 72 73 74 77 78

The GC was triggered by thread: 75

The GC is working on suspending threads to continue with garbage collection
The following threads can't be suspended because preemptive GC is disabled:
  23 25 33 34 35 38 56 61

The Finalizer (Thread 20) is not blocked

The issue the customer is running into is a hang during heavy load. The only way to get out of the hang is to recycle the process (IISReset).

Debugging the issue

I have seen this issue before on a few occasions and although as you will see later it was currently fixed in the framework, but in my earlier cases on this we ended up not needing a fix since the customers I worked with made code changes that made it such that they no longer were subject to the issue.

So what is going on here?

Thread 75 triggered a garbage collection by making an allocation that would have made Gen 0 go over its allocation budget.

0:075> kb 2000
ChildEBP RetAddr  Args to Child
1124de6c 7c822124 77e6bad8 000002e8 00000000 ntdll!KiFastSystemCallRet
1124de70 77e6bad8 000002e8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
1124dee0 79e718fd 000002e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
1124df24 79e718c6 000002e8 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
1124df74 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
1124df84 7a0d0d0f ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
1124dfa8 7a0d5289 ffffffff 000d4558 106cb970 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99
1124dfcc 7a0d5fa2 00000000 00000000 00000020 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x267
1124e058 7a0d691f 106cb970 00000020 00000000 mscorwks!SVR::gc_heap::try_allocate_more_space+0x1c0
1124e078 7a0d7ecc 106cb970 00000020 00000000 mscorwks!SVR::gc_heap::allocate_more_space+0x2f
1124e098 7a08bd32 106cb970 00000020 00000002 mscorwks!SVR::GCHeap::Alloc+0x74
1124e0b4 79e7b43e 00000020 00000000 00080000 mscorwks!Alloc+0x60
1124e180 79e8f41c 79157f42 1124e230 00000001 mscorwks!AllocateArrayEx+0x1d1
1124e244 7937f5c2 064f60b8 064f60b8 064f60b8 mscorwks!JIT_NewArr1+0x167
1124e27c 5088a509 00000000 00000000 00000000 mscorlib_ni!System.Reflection.RuntimeMethodInfo.GetParameters()+0x4a
...

Nothing strange there, allocation and garbage collection happens all the time, however a GC is usually extremely fast and in this case we have 45 threads waiting for the GC to finish.

0:014> kb
ChildEBP RetAddr  Args to Child
01a0fc74 7c822124 77e6bad8 000002e4 00000000 ntdll!KiFastSystemCallRet
01a0fc78 77e6bad8 000002e4 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01a0fce8 79e718fd 000002e4 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
01a0fd2c 79e718c6 000002e4 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
01a0fd7c 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01a0fd8c 7a0851cb ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01a0fd9c 79f40e96 00000000 13d43bb8 0e874858 mscorwks!SVR::GCHeap::WaitUntilGCComplete+0x32
01a0fdd8 79e7385b 00000001 7a0e607b 00000001 mscorwks!Thread::RareDisablePreemptiveGC+0x1a1
01a0fde0 7a0e607b 00000001 13d43838 00000102 mscorwks!GCHolder<1,0,0>::GCHolder<1,0,0>+0x2d
01a0fe2c 7a0e673e 00000000 7a393704 7a114dea mscorwks!Thread::OnThreadTerminate+0x53
01a0fe38 7a114dea 0e874858 13d4385c 00000000 mscorwks!DestroyThread+0x43
01a0fe94 79f79c4f 00000000 00000000 00000000 mscorwks!ThreadpoolMgr::CompletionPortThreadStart+0x33d
01a0ffb8 77e6608b 000c4c00 00000000 00000000 mscorwks!ThreadpoolMgr::intermediateThreadProc+0x49
01a0ffec 00000000 79f79c09 000c4c00 00000000 kernel32!BaseThreadStart+0x34

so for some reason the GC appears to be taking some time.

There are 2 GC threads (one per logical processor), thread 18 and 19 in this case… Thread 19 is simply waiting for work

0:019> kb
ChildEBP RetAddr  Args to Child
01e2fd68 7c822124 77e6bad8 000002d0 00000000 ntdll!KiFastSystemCallRet
01e2fd6c 77e6bad8 000002d0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01e2fddc 79e718fd 000002d0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
01e2fe20 79e718c6 000002d0 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
01e2fe70 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01e2fe80 7a0d8898 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01e2fea8 7a0d8987 01e2ff00 77e60eb5 01e2fec8 mscorwks!SVR::gc_heap::gc_thread_function+0x58
01e2ffb8 77e6608b 000d5050 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
01e2ffec 00000000 7a0d88eb 000d5050 00000000 kernel32!BaseThreadStart+0x34

But interestingly enough thread 18 is waiting to suspend all managed threads in order to continue the GC (to avoid for anyone to allocate any more data while it is performing the GC)

0:018> kb
ChildEBP RetAddr  Args to Child
01defb10 7c821524 77e98ef4 00000f64 01defb34 ntdll!KiFastSystemCallRet
01defb14 77e98ef4 00000f64 01defb34 01defe04 ntdll!NtGetContextThread+0xc
01defb24 7a0de046 00000f64 01defb34 00010002 kernel32!GetThreadContext+0x11
01defe04 7a0defc1 00000f64 1069d328 13aa3858 mscorwks!EnsureThreadIsSuspended+0x3f
01defe4c 7a0e290a 00000000 00000000 13aa3874 mscorwks!Thread::SuspendThread+0xd0
01defe9c 7a086e76 00000000 13aa399c 00000000 mscorwks!Thread::SysSuspendForGC+0x5a6
01deff88 7a0d867b 00000001 00000000 000d4368 mscorwks!SVR::GCHeap::SuspendEE+0x16c
01deffa8 7a0d8987 00000000 13aa39ac 01deffec mscorwks!SVR::gc_heap::gc_thread_function+0x3b
01deffb8 77e6608b 000d4368 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
01deffec 00000000 7a0d88eb 000d4368 00000000 kernel32!BaseThreadStart+0x34

Normally suspending all managed threads happens in a matter of nanoseconds, and pretty much the only thing that could cause the process to be stuck while suspending is if some thread has disabled preemptive GC, i.e. told the GC that it is in a state where it can’t be disturbed…

Yun Jin describes PreemptiveGC like this in one of his posts

Preemptive GC: also very important. In Rotor, this is m_fPreemptiveGCDisabled field of C++ Thread class. It indicates what GC mode the thread is in: “enabled” in the table means the thread is in preemptive mode where GC could preempt this thread at any time; “disabled” means the thread is in cooperative mode where GC has to wait the thread to give up its current work (the work is related to GC objects so it can’t allow GC to move the objects around). When the thread is executing managed code (the current IP is in managed code), it is always in cooperative mode; when the thread is in Execution Engine (unmanaged code), EE code could choose to stay in either mode and could switch mode at any time; when a thread are outside of CLR (e.g, calling into native code using interop), it is always in preemptive mode.

In our case, as we can see from the output from my tool the following threads have preemptive GC disabled (which is very uncommon)

23 25 33 34 35 38 56 61

In the !threads output it looks like this (notice the PreEmptive GC column)

0:018> !threads
ThreadCount: 59
UnstartedThread: 0
BackgroundThread: 59
PendingThread: 0
DeadThread: 0
Hosted Runtime: yes
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
  16    1 1008 000d0828   1808220 Enabled  06b79310:06b79db0 000f0728     1 MTA (Threadpool Worker)
  20    2 1e7c 000d62d0      b220 Enabled  00000000:00000000 000cd190     0 MTA (Finalizer)
  21    3 1d3c 000db120      1220 Enabled  00000000:00000000 000cd190     0 Ukn
  22    4 1c38 000ed3c8    80a220 Enabled  00000000:00000000 000cd190     0 MTA (Threadpool Completion Port)
  23    5 19ac 0013a520   180b222 Disabled 06c17754:06c18bd8 00158e70     2 MTA (Threadpool Worker)
  24    6 1fd4 0014d1a0      b220 Enabled  00000000:00000000 000f0728     1 MTA
  25    7 1568 0e81aae0   180b222 Disabled 02cea570:02cec4a8 00158e70     2 MTA (Threadpool Worker)
  26    8 1ad0 0e82bd58      b220 Enabled  00000000:00000000 00158e70     0 MTA
  27    9 1f04 0e829310      b220 Enabled  032b0cb8:032b2938 00158e70     0 MTA
  28    a  ffc 0e820d18      b220 Enabled  070173e4:070193c0 00158e70     1 MTA
  14    b 1ee8 0e874858   1800220 Enabled  02b34750:02b34c78 000cd190     0 Ukn (Threadpool Worker)
  30    d 1080 0e8aff40      b220 Enabled  00000000:00000000 0e87c310     0 MTA
  31    e 11ec 0e8d1f28   8801220 Enabled  02b418e4:02b42c78 000cd190     0 MTA (Threadpool Completion Port)
  33    f 1f1c 0e8e6420   180b222 Disabled 06bc1fa0:06bc3dcc 00158e70     2 MTA (Threadpool Worker)
  34   10 1a6c 0e8e8b20   180b222 Disabled 02c449b8:02c45d74 00158e70     2 MTA (Threadpool Worker)
  35   11  710 0e8ee550   180b222 Disabled 06cdde68:06cdea48 00158e70     2 MTA (Threadpool Worker)
  36    c 1ae0 0e875228   180b220 Enabled  072bd88c:072bf508 000cd190     0 MTA (Threadpool Worker)
  37   12 19c8 00147290   180b220 Enabled  06bb4b70:06bb5dcc 000f0728     1 MTA (Threadpool Worker)
  38   13 18f0 0e8901d8   180b222 Disabled 02c49ce8:02c49d74 00158e70     2 MTA (Threadpool Worker)
  39   14 19b8 0e8f7338   180b220 Enabled  073c6fbc:073c6fe8 00158e70     1 MTA (Threadpool Worker)
  40   15 1308 0e8f8610   180b220 Enabled  02af10a4:02af10ac 00158e70     3 MTA (Threadpool Worker)
  41   16 1f38 0e8f99e8   180b220 Enabled  070c7154:070c761c 000cd190     0 MTA (Threadpool Worker)
  42   17 1be0 0e8facc0   180b220 Enabled  06ad3a74:06ad4cc8 000cd190     0 MTA (Threadpool Worker)
  43   18 1efc 0e8fbf98   180b220 Enabled  0321b11c:0321c938 000cd190     0 MTA (Threadpool Worker)
  44   19 1470 0e8fd160   1801220 Enabled  06ef6770:06ef7178 000cd190     0 MTA (Threadpool Worker)
  45   1a  150 0e8fe438   180b220 Enabled  02a15a34:02a161bc 000cd190     0 MTA (Threadpool Worker)
  46   1b 1b10 0e8ff710   180b220 Enabled  06b216b8:06b22d00 000f0728     1 MTA (Threadpool Worker)
  47   1c 1c8c 10690ac0   180b220 Enabled  02be2430:02be3b00 000f0728     1 MTA (Threadpool Worker)
  48   1d 16c8 10691ad8   180b220 Enabled  06b31520:06b3189c 000f0728     1 MTA (Threadpool Worker)
  49   1e 1a5c 10692e20   180b220 Enabled  02ab3a48:02ab50ac 000f0728     1 MTA (Threadpool Worker)
  50   1f 1908 10694168   180b220 Enabled  06b686fc:06b69db0 000f0728     1 MTA (Threadpool Worker)
  51   20  284 106954d0   180b220 Enabled  0319d1e4:0319e900 000cd190     0 MTA (Threadpool Worker)
  52   21 1d74 10696708   180b220 Enabled  06bac634:06baddcc 000f0728     1 MTA (Threadpool Worker)
  53   22  58c 106979c8   180b220 Enabled  02c14784:02c15d58 000f0728     1 MTA (Threadpool Worker)
  54   23 1860 10698d10   180b220 Enabled  072e1984:072e3508 00158e70     1 MTA (Threadpool Worker)
  55   24 1c9c 1069ae00   180b220 Enabled  071a4784:071a5508 000cd190     0 MTA (Threadpool Worker)
  56   25 15c4 1069d328   180b222 Disabled 06bc7174:06bc7dcc 00158e70     2 MTA (Threadpool Worker)
  57   26 1968 106a09e0   180b220 Enabled  0321da48:0321e938 000cd190     0 MTA (Threadpool Worker)
  58   27 1e10 106a3608   180b220 Enabled  070172c8:070173c0 000cd190     0 MTA (Threadpool Worker)
  59   28 1a18 106a6418   180b220 Enabled  02ac036c:02ac10ac 000f0728     1 MTA (Threadpool Worker)
  60   29 1d2c 106a9148   180b220 Enabled  06a9b5a4:06a9cc90 000cd190     0 MTA (Threadpool Worker)
  61   2a 1b50 106ac0a8   180b222 Disabled 06cdaf94:06cdbda8 00158e70     2 MTA (Threadpool Worker)
  62   2b  96c 106ae8d0   8801220 Enabled  02c00980:02c01d3c 000cd190     0 MTA (Threadpool Completion Port)
  63   2c 1b18 106afa50   180b220 Enabled  0322214c:03222938 000cd190     0 MTA (Threadpool Worker)
  64   2d 1d78 106b2d48   180b220 Enabled  06f26e78:06f26eb4 000cd190     0 MTA (Threadpool Worker)
  65   2e 198c 106b5cb8   180b220 Enabled  06ed8418:06ed8c98 000cd190     0 MTA (Threadpool Worker)
  66   2f 1fcc 106b8b28   180b220 Enabled  0728d400:0728d508 00158e70     1 MTA (Threadpool Worker)
  67   30 1958 106bb998   180b220 Enabled  0304e7e8:03050420 000cd190     0 MTA (Threadpool Worker)
  68   31 1b48 106beb58   180b220 Enabled  031a5b98:031a691c 000cd190     0 MTA (Threadpool Worker)
  69   32 1d80 106c19c8   180b220 Enabled  06b13444:06b14d00 000f0728     1 MTA (Threadpool Worker)
  70   33 17cc 106c4838   180b220 Enabled  0700d468:0700f3a4 000cd190     0 MTA (Threadpool Worker)
  71   34 1424 106c8890   180b220 Enabled  06b6f8bc:06b6fdb0 000f0728     1 MTA (Threadpool Worker)
  72   35 1e00 106bd870   180b220 Enabled  06b8dcbc:06b8ddb0 000f0728     1 MTA (Threadpool Worker)
  73   36 1d08 106c95e0   180b220 Enabled  071a081c:071a1508 000cd190     0 MTA (Threadpool Worker)
  74   37 1fdc 106ca680   180b220 Enabled  02c05810:02c05d3c 000cd190     0 MTA (Threadpool Worker)
  75   38 1b8c 106cb930   180b220 Enabled  0335f888:0335f88c 00158e70     2 MTA (Threadpool Worker)
  76   39 1928 106cc900   880b220 Enabled  072bc2f0:072bd508 000cd190     0 MTA (Threadpool Completion Port)
  77   3a 1fd0 106d7270   8801220 Enabled  02c18ca4:02c19d58 000cd190     0 MTA (Threadpool Completion Port)
  78   3b 1640 106d7908   180b220 Enabled  0294474c:02945a7c 000cd190     0 MTA (Threadpool Worker)

Why have these threads disabled preemptive GC, not allowing the GC to suspend them, and ultimately blocking our process?

All the blocked threads are sitting in this type of call stack, trying to enter a lock (JIT_MonTryEnter). Normally you would see a thread either owning the lock or waiting in an aware lock like in this post, but here it is just spinning trying to enter the lock.

0:056> kb 2000
ChildEBP RetAddr  Args to Child
10b0f4d8 0eb65afc 06bc5448 06bc53f8 22a9c796 mscorwks!JIT_MonTryEnter+0xad
WARNING: Frame IP not in any known module. Following frames may be wrong.
10b0f504 69918f30 10b0f560 06bc5448 06bc5448 0xeb65afc
00000000 00000000 00000000 00000000 00000000 System_Web_Services_ni+0x28f30

unfortunately the managed stack is not very helpful in telling us where the lock was taken

0:056> !clrstack
OS Thread Id: 0x15c4 (56)
ESP       EIP
10b0f8f8 79e73eac [ContextTransitionFrame: 10b0f8f8]
10b0f948 79e73eac [GCFrame: 10b0f948]
10b0faa0 79e73eac [ComMethodFrame: 10b0faa0]

and if we run !syncblk we have no active sync blocks, so no help there when it comes to finding out who owns this lock.

0:056> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
-----------------------------
Total           165
CCW             4
RCW             1
ComClassFactory 0
Free            8

seems like we are stuck between a rock and a hard place here.

I have shown the command !dumpstack before. It shows a mixture of a managed and native call stack but it is a raw stack, meaning that it will pretty much just display any addresses on the stack that happen to be pointing to code. This means that !dumpstack will not give a true stack trace (i.e. like !clrstack and kb where if functionA calls functionB we will see functionA directly below functionB in the stack) and anything shown by !dumpstack may or may not be correct. Anyways, with that little warning, here comes !dumpstack :)

0:056> !dumpstack
OS Thread Id: 0x15c4 (56)
Current frame: mscorwks!JIT_MonTryEnter+0xad
ChildEBP RetAddr  Caller,Callee
10b0f47c 638c7552 (MethodDesc 0x63a539a0 +0x62 System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet)),
    calling mscorwks!JIT_MonTryEnter
10b0f480 793463bb (MethodDesc 0x7923bbc8 +0xdb System.Collections.Hashtable..ctor(Int32, Single)),
    calling mscorwks!JIT_Dbl2IntSSE2
10b0f498 638b36f6 (MethodDesc 0x63a505f0 +0xa6 System.Xml.Schema.SchemaInfo..ctor()), 
    calling (MethodDesc 0x7923bbc8 +0 System.Collections.Hashtable..ctor(Int32, Single))
10b0f49c 638b3719 (MethodDesc 0x63a505f0 +0xc9 System.Xml.Schema.SchemaInfo..ctor()), 
    calling mscorwks!JIT_Writeable_Thunks_Buf
10b0f4d8 0eb65afc (MethodDesc 0xeb756f0 +0x5c CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage(System.Web.Services.Protocols.SoapMessage)), 
    calling (MethodDesc 0x63a539a0 +0 System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet))
10b0f504 69918f30 (MethodDesc 0x699c4798 +0x3c System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean))
10b0f518 699227b3 (MethodDesc 0x699c5250 +0x2cb System.Web.Services.Protocols.SoapServerProtocol.Initialize()),
    calling (MethodDesc 0x699c4798 +0 System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean))
...

!dumpstack is a bit tricky to deal with, but in the stack above if we start from the bottom of the part of the stack that I’ve shown we have SoapServerProtocol.Initialize() which calls SoapMessage.RunExtensions.

This we can trust because !dumpstack is actually telling us that Initialize is calling RunExtensions in this case (assuming that we believe that Initialize was called).

RunExtensions doesn’t tell us what it is calling but that is because it is calling into a custom component CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage.

This is in turn calling into XmlSchemaSet.Add which is calling into JIT_MonTryEnter. So long story short, the real stack trace should look like this

mscorwks!JIT_MonTryEnter+0xad
System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet)
CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage(System.Web.Services.Protocols.SoapMessage)System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean)
System.Web.Services.Protocols.SoapServerProtocol.Initialize()
...

!sos.clrstack just wasn’t able to rebuild it because of the state it was in

Looking through the output from ~* e !clrstack I find one stack that is currently in XmlSchemaSet.Add and thus is probably the one holding the lock we are trying to enter here

OS Thread Id: 0x1308 (40)
ESP       EIP
104ce878 7c82ed54 [HelperMethodFrame: 104ce878]
104ce8d0 638b46bf System.Xml.Schema.SchemaNames..ctor(System.Xml.XmlNameTable)
104cebd4 638ca6fb System.Xml.Schema.XmlSchemaSet.GetSchemaNames(System.Xml.XmlNameTable)
104cebe0 638c96fc System.Xml.Schema.XmlSchemaSet.PreprocessSchema(System.Xml.Schema.XmlSchema ByRef, System.String)
104cebf8 638c86b7 System.Xml.Schema.XmlSchemaSet.Add(System.String, System.Xml.Schema.XmlSchema)
104cec04 638c7667 System.Xml.Schema.XmlSchemaSet.Add(System.Xml.Schema.XmlSchemaSet)
104cec60 0eb65afc CustomComponent.Tools.Web.Services.Extensions.ValidationExtension.ProcessMessage(System.Web.Services.Protocols.SoapMessage)
104cec8c 69918f30 System.Web.Services.Protocols.SoapMessage.RunExtensions(System.Web.Services.Protocols.SoapExtension[], Boolean)
104ceca4 699227b3 System.Web.Services.Protocols.SoapServerProtocol.Initialize()
104cece8 6990d904 System.Web.Services.Protocols.ServerProtocolFactory.Create(System.Type, System.Web.HttpContext, System.Web.HttpRequest, System.Web.HttpResponse, Boolean ByRef)
104ced28 699263ab System.Web.Services.Protocols.WebServiceHandlerFactory.CoreGetHandler(System.Type, System.Web.HttpContext, System.Web.HttpRequest, System.Web.HttpResponse)
104ced64 69926329 System.Web.Services.Protocols.WebServiceHandlerFactory.GetHandler(System.Web.HttpContext, System.String, System.String, System.String)
104ced88 65fc057c System.Web.HttpApplication.MapHttpHandler(System.Web.HttpContext, System.String, System.Web.VirtualPath, System.String, Boolean)
104cedcc 65fd58cd System.Web.HttpApplication+MapHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
104ceddc 65fc1610 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
104cee1c 65fd32e0 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
104cee6c 65fc0225 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
104cee88 65fc550b System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
104ceebc 65fc5212 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
104ceec8 65fc3587 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
104cf078 79f35ee8 [ContextTransitionFrame: 104cf078]
104cf0c8 79f35ee8 [GCFrame: 104cf0c8]
104cf220 79f35ee8 [ComMethodFrame: 104cf220]
0:040> kb
ChildEBP RetAddr  Args to Child
104ce5d4 7c822124 77e6bad8 000002e8 00000000 ntdll!KiFastSystemCallRet
104ce5d8 77e6bad8 000002e8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
104ce648 79e718fd 000002e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
104ce68c 79e718c6 000002e8 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
104ce6dc 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
104ce6ec 7a0d0d0f ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
104ce710 7a0d5dfa ffffffff 00001037 000d4368 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99
104ce788 7a0d691f 0e8f8650 00000014 00000000 mscorwks!SVR::gc_heap::try_allocate_more_space+0x17
104ce7a8 7a0d7ecc 0e8f8650 00000014 00000000 mscorwks!SVR::gc_heap::allocate_more_space+0x2f
104ce7c8 7a08bd32 0e8f8650 00000014 00000002 mscorwks!SVR::GCHeap::Alloc+0x74
104ce7e4 79e754ff 00000014 00000000 00080000 mscorwks!Alloc+0x60
104ce824 79e755c1 639f59e0 02382edc 02af0090 mscorwks!FastAllocateObject+0x38
104ce8c8 638b46bf 02af0e60 00000000 00000000 mscorwks!JIT_NewFast+0x9e
104ce8cc 02af0e60 00000000 00000000 00000000 System_Xml_ni+0x1146bf
WARNING: Frame IP not in any known module. Following frames may be wrong.
104ce8d0 00000000 00000000 00000000 00000000 0x2af0e60

But unfortunately it can’t give up this lock and finish what it is doing because it is waiting for the GC to finish, so effectively we are in a deadlock.

Thread 40 holds a lock but to release it it needs the GC to complete. The GC can’t complete because it can’t suspend thread 56 (and other threads) that have preemptive GC disabled. Thread 56 can’t enable preemptive GC until it gets the lock owned by thread 40.

I should add, that the only times I have seen this, it has been in the lock in XmlSchemaSet.Add, and under heavy load when multiple threads were trying to access the same XmlSchemaSet.

Solution

So what can we do about this? Well, a hotfix just came out (KB946644) that will fix this problem so if you are running into this issue you can call into support and ask for that hotfix.

In my earlier cases, the customers have locked around the XmlSchemaSet in the custom component since according to the msdn documentation any instance members of XmlSchemaSet are not guaranteed to be thread safe. This resolved the issue.

Laters,

Tess