A Hang Scenario, Locks and Critical Sections
The situation here is that the server at times slows down considerably, or even hangs completely. Requests start timing out and no new requests seem to be getting through. The scenario below is a bit simplified but it shows the technique to find owners, waiters and locations for locks.
A quick look with ~* kb
(listing the native stack for all threads) shows us that most threads are sitting in this type of stack…
30 Id: d78.c68 Suspend: 1 Teb: 7ff5a000 Unfrozen
ChildEBP RetAddr Args to Child
0c6becb4 7c822114 77e6711b 00000001 0c6bed04 ntdll!KiFastSystemCallRet
0c6becb8 77e6711b 00000001 0c6bed04 00000000 ntdll!NtWaitForMultipleObjects+0xc
0c6bed60 791e0b3b 00000001 00134f7c 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0c6bed90 791e0bdd 00000001 00134f7c 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0c6bede4 791fccfe 00000001 00134f7c 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0c6bee68 791fcc17 01fcc738 ffffffff 00000000 mscorsvr!AwareLock::EnterEpilog+0x9d
0c6bee84 791fd43e 0c6bef7c 064aecf4 064ad210 mscorsvr!AwareLock::Enter+0x78
0c6bef18 0c1c0744 064ad210 064aecf4 0c6bef7c mscorsvr!JITutil_MonContention+0x124
WARNING: Frame IP not in any known module. Following frames may be wrong.
0c6bef40 0c1c0697 064aeca0 064ae094 064ae094 0xc1c0744
0c6bef84 0c525abd 064aeca0 064ad210 064ae094 0xc1c0697
0c6bf0cc 799a67a2 064ac400 064ac704 799b4168 0xc525abd
0c6bf0d8 799b4168 00000002 00000000 0210177c mscorlib_79990000+0x167a2
0c6bf160 79217188 00000000 00da9a60 061bc490 mscorlib_79990000+0x24168
0c6bf1d4 7923c206 01eb2f60 0c6bf2dc 0015d990 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
0c6bf214 791cc3c8 0c6bf284 791b3d64 0c6bf300 mscorsvr!ComCallMLStubCache::CompileMLStub+0x1af
0c6bf25c 7923c1c3 0015d990 7923c1ce 0c6bf284 mscorsvr!Thread::DoADCallBack+0x5c
0c6bf2c4 01afa0bd 01fcc738 0c6bf300 0c6bffdc mscorsvr!ComCallMLStubCache::CompileMLStub+0x2ba
0c6bf3b8 791cc310 000de250 00000000 791cc32b 0x1afa0bd
0c6bf41c 791b3e6f 791cc131 0010f14c 00000000 mscorsvr!Thread::SafeSetStaticData+0x63
0c6bf4f8 7c822124 77e6baa8 000002e4 00000000 mscorsvr!Frame::Pop+0x9
So from managed code at the bottom of the stack, we go into something called JITutil_MonContention which then goes into AwareLock::Enter. This means that the thread is sitting in some type of synchronization statement, likely a lock(){}
waiting for that lock to be free so we can enter the critical section.
There are two questions to answer here
- Where are we locking?
- Who owns the lock?
For the answer to the second question, “who owns the lock?” we can turn to !syncblk
. !syncblk
without any parameters will list SyncBlocks corresponding to objects owned by active threads.
0:018> !syncblk
Index SyncBlock MonitorHeld Recursion Thread ThreadID Object Waiting
987 0x00134f68 19 1 0x10eb28 0xcc8 17 0x022841fc System.Object
Waiting threads: 18 19 23 24 25 26 27 29 30
-----------------------------
Total 1039
ComCallWrapper 4
ComPlusWrapper 1
ComClassFactory 0
Free 12
MonitorHeld shows 19. The owner has 1 and then each waiter has 2, so this means we have one owner and 9 waiters. We are synchronizing on a System.Object (0x022841fc) and the owning thread is thread 17.
The waiting threads are 18, 19, 23, 24…
So thread 18 for example… looks like this
0:018> !clrstack
Thread 18
ESP EIP
0x0c3ef490 0x7c82ed54 [FRAME: GCFrame]
0x0c3ef54c 0x7c82ed54 [FRAME: HelperMethodFrame]
0x0c3ef5a0 0x0c1c0744 [DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
0x0c3ef5c8 0x0c1c0697 [DEFAULT] [hasThis] Void SuperSlowWebApp.MyPage.btnUpdateCache_Click(Object,Class System.EventArgs)
0x0c3ef610 0x0c525abd [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.OnClick(Class System.EventArgs)
0x0c3ef624 0x0c52589a [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String)
0x0c3ef634 0x0c52584b [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Web.UI.IPostBackEventHandler,String)
0x0c3ef63c 0x0c5257aa [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Collections.Specialized.NameValueCollection)
0x0c3ef64c 0x0c4c6ead [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0c3ef690 0x0c4c5937 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0c3ef6cc 0x0c4c53a3 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0c3ef6d4 0x0c4c537c [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0c3ef6e4 0x0c1da288 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0c3ef72c 0x0c1d9322 [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0c3ef774 0x0c1d91eb [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0c3ef790 0x01eb6897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0c3ef7cc 0x01eb6448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x0c3ef7d8 0x01eb2fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x0c3ef8a0 0x79217188 [FRAME: ContextTransitionFrame]
0x0c3ef980 0x79217188 [FRAME: ComMethodFrame]
Waiting in CacheUpdater.Update(), and if we disassemble we can see this…
0:018> !u 0x0c1c0744
Will print '>>> ' at address: 0x0c1c0744
Normal JIT generated code
[DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
Begin 0x0c1c0718, size 0x68
0c1c0718 55 push ebp
0c1c0719 8bec mov ebp,esp
0c1c071b 83ec14 sub esp,0x14
0c1c071e 57 push edi
0c1c071f 56 push esi
0c1c0720 53 push ebx
0c1c0721 33c0 xor eax,eax
0c1c0723 8945ec mov [ebp-0x14],eax
0c1c0726 c745f800000000 mov dword ptr [ebp-0x8],0x0
0c1c072d c745ec00000000 mov dword ptr [ebp-0x14],0x0
0c1c0734 a12c66170a mov eax,[0a17662c] (Object: System.Object)
0c1c0739 8945ec mov [ebp-0x14],eax
0c1c073c 8b4dec mov ecx,[ebp-0x14]
0c1c073f e8e712ff6c call mscorsvr!JIT_MonEnter (791b1a2b)
>>> 0c1c0744 b988130000 mov ecx,0x1388
0c1c0749 e85da79f6d call mscorlib_79990000+0x22aeab (79bbaeab) (System.Threading.Thread.Sleep)
0c1c074e 90 nop
0c1c074f c745f400000000 mov dword ptr [ebp-0xc],0x0
0c1c0756 c745f8fc000000 mov dword ptr [ebp-0x8],0xfc
0c1c075d 686f071c0c push 0xc1c076f
0c1c0762 eb00 jmp 0c1c0764
0c1c0764 8b4dec mov ecx,[ebp-0x14]
0c1c0767 e8e114ff6c call mscorsvr!JIT_MonExit (791b1c4d)
0c1c076c 58 pop eax
0c1c076d ffe0 jmp eax
0c1c076f c745f800000000 mov dword ptr [ebp-0x8],0x0
0c1c0776 eb00 jmp 0c1c0778
0c1c0778 90 nop
0c1c0779 5b pop ebx
0c1c077a 5e pop esi
0c1c077b 5f pop edi
0c1c077c 8be5 mov esp,ebp
0c1c077e 5d pop ebp
0c1c077f c3 ret
So we are sitting just before a call to Sleep(0x1388) or Sleep(5000) in a JIT_MonEnter call
In fact, the function looks like this… and we are stuck on the lock(_syncRoot) line
public static void Update()
{
lock(_syncRoot)
{
System.Threading.Thread.Sleep(5000);
}
}
And thread 17 (the lock owner) looks like this
0:017> !clrstack
Thread 17
ESP EIP
0x009ff610 0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)
0x009ff620 0x0c1c074e [DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
0x009ff648 0x0c1c0697 [DEFAULT] [hasThis] Void SuperSlowWebApp.MyPage.btnUpdateCache_Click(Object,Class System.EventArgs)
0x009ff690 0x0c525abd [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.OnClick(Class System.EventArgs)
0x009ff6a4 0x0c52589a [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String)
0x009ff6b4 0x0c52584b [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Web.UI.IPostBackEventHandler,String)
0x009ff6bc 0x0c5257aa [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Collections.Specialized.NameValueCollection)
0x009ff6cc 0x0c4c6ead [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x009ff710 0x0c4c5937 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x009ff74c 0x0c4c53a3 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x009ff754 0x0c4c537c [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x009ff764 0x0c1da288 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x009ff7ac 0x0c1d9322 [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x009ff7f4 0x0c1d91eb [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x009ff810 0x01eb6897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x009ff84c 0x01eb6448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x009ff858 0x01eb2fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x009ff920 0x79217188 [FRAME: ContextTransitionFrame]
0x009ffa00 0x79217188 [FRAME: ComMethodFrame]
So in summary all the requests are waiting for thread 17 to come out of its sleep and release the lock
If we grab the System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6 from !dso (dumpstackobjects)
0:017> !dso
Thread 17
ESP/REG Object Name
…
0x009ff820 0x024961f4 System.Web.HttpContext
0x009ff824 0x02495ed8 System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
0x009ff828 0x061c3460 System.Web.HttpRuntime
0x009ff84c 0x061ab370 System.String c:\inetpub\wwwroot\SuperSlowWebApp\
0x009ff850 0x02495ff0 System.String c:\inetpub\wwwroot\SuperSlowWebApp\
0x009ff870 0x061bc490 System.Web.Hosting.ISAPIRuntime
0x009ff934 0x021a0770 System.Runtime.Remoting.Messaging.LogicalCallContext
And look at the _startTime …
0:017> !do 0x02495ed8
Name: System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
MethodTable 0x02061fec
EEClass 0x01ee8460
Size 200(0xc8) bytes
GC Generation: 0
mdToken: 0x02000115 (c:\windows\assembly\gac\system.web\1.0.5000.0__b03f5f7f11d50a3a\system.web.dll)
FieldDesc*: 0x02061e78
MT Field Offset Type Attr Value Name
0x01b28a80 0x4000798 0x4 VALUETYPE instance start at 02495edc _startTime
0x01b28a80 0x4000799 0 CLASS shared static s_HTTPStatusDescriptions
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c0e38 <<
0x01b28a80 0x400079a 0x4 CLASS shared static s_requestHeaderNames
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c0f80 <<
0x01b28a80 0x400079b 0x8 CLASS shared static s_responseHeaderNames
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c1030 <<
0x01b28a80 0x400079c 0xc CLASS shared static s_requestHeadersLoookupTable
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c10b8 <<
0x01b28a80 0x400079d 0x10 CLASS shared static s_responseHeadersLoookupTable
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c1194 <<
0x01b2938c 0x400096f 0x5c System.Int32 instance 14358560 _ecb
0x01b2938c 0x4000970 0x60 System.Int32 instance 0 _token
…we can use !convertvtdatetodate
to find out when the request started …
0:017> !convertvtdatetodate 02495edc
01/09/2006 11:38:35
…and compare this to the current time given by .time
0:017> .time
Debug session time: Mon Jan 9 12:40:15.000 2006 (GMT+1)
System Uptime: 2 days 11:36:47.637
Process Uptime: 0 days 0:19:48.000
Kernel time: 0 days 0:00:01.000
User time: 0 days 0:00:00.000
Note: the time for the request is one hour off since it is given in GMT time, and I am located in the GMT+1 timezone
So we can see that this request has been executing for 1 minute and 40 seconds so far (11:38:35 -> 11:40:15), probably because it has been blocked waiting to get the lock in the first place.
You can use this method to find out how long any request has been executing in the same way, and if there is not an ISAPIWorkerRequestInProcForIIS6 on the stack, you can use the HttpContext._request._wr to get to it (by using !dumpobj
)
As a side note, when you use .time in a live debug session it will give you the time now, even if you stopped the debugger a while back, so this technique for finding out how long a request has been executing works best in dumps.
A short note on Critical Sections
Similarly if you have threads stuck in a native synchronization block like ntdll!RtlEnterCriticalSection…
0:021> kb
ChildEBP RetAddr Args to Child
0210e7cc 7c822124 7c83970f 000009f8 00000000 ntdll!KiFastSystemCallRet
0210e7d0 7c83970f 000009f8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0210e80c 7c839620 00000000 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x19c
0210e82c 4bfa4c16 027c0714 0277a5b4 4bf727af ntdll!RtlEnterCriticalSection+0xa8
0210e838 4bf727af 0277a5b4 0277a3a0 0210e868 odbc32!CCriticalSection::Enter+0xf
0210e848 4bf79ae6 0277a5b4 021c2d60 0210e9e8 odbc32!ODBCEnterCriticalSection+0xf
0210e868 4bf923bb 0277a3a0 00000001 000003ee odbc32!FreeStmt+0xe4
0210e884 026f0a35 00000003 4bfa0000 0210e8dc odbc32!SQLFreeHandle+0x1f3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0210e928 791b33e4 00000000 0260aa80 0210e948 0x26f0a35
0210e938 03720601 0210e960 791b3208 0210eb14 mscorsvr!EEClass::GetFixedUpSlot+0x1d
0210e948 791f4512 0210e994 00000000 0210e96c 0x3720601
0210eb14 021824ee 0210eb20 10311208 00000001 mscorsvr!COMMember::InvokeMethod+0x95a
0210eb64 799dd5a2 00000001 00000000 00000000 0x21824ee
0210eba8 799dd3e0 00000001 00000000 1525e724 mscorlib_79990000+0x4d5a2
00000000 00000000 00000000 00000000 00000000 mscorlib_79990000+0x4d3e0
…you can use the first parameter to the RtlEnterCriticalSection function (0x027c0714) and pass it to the !critsec
function to find out who owns the critical section
0:021> !critsec 027c0714
CritSec +27c0714 at 027C0714
WaiterWoken No
LockCount 1
RecursionCount 1
OwningThread 1490
EntryCount 0
ContentionCount 4
*** Locked
In this case the owning thread is 1490 and you can find the logical thread number by using ~
0:021> ~
# 0 Id: 1748.558 Suspend: 1 Teb: bffdf000 Unfrozen
1 Id: 1748.15d4 Suspend: 1 Teb: bffdd000 Unfrozen
2 Id: 1748.1628 Suspend: 1 Teb: bffdc000 Unfrozen
3 Id: 1748.1524 Suspend: 1 Teb: bffd8000 Unfrozen
4 Id: 1748.17d0 Suspend: 1 Teb: bffd7000 Unfrozen
5 Id: 1748.12b4 Suspend: 1 Teb: bffd6000 Unfrozen
6 Id: 1748.17e8 Suspend: 1 Teb: bffd5000 Unfrozen
7 Id: 1748.1220 Suspend: 1 Teb: bffd4000 Unfrozen
8 Id: 1748.1148 Suspend: 1 Teb: bffaf000 Unfrozen
9 Id: 1748.16a8 Suspend: 1 Teb: bffae000 Unfrozen
10 Id: 1748.12e0 Suspend: 1 Teb: bffad000 Unfrozen
11 Id: 1748.1278 Suspend: 1 Teb: bffac000 Unfrozen
12 Id: 1748.1700 Suspend: 1 Teb: bffab000 Unfrozen
13 Id: 1748.9f0 Suspend: 1 Teb: bffa8000 Unfrozen
14 Id: 1748.1690 Suspend: 1 Teb: bffa7000 Unfrozen
15 Id: 1748.11f8 Suspend: 1 Teb: bffa6000 Unfrozen
16 Id: 1748.2c0 Suspend: 1 Teb: bffa5000 Unfrozen
17 Id: 1748.134c Suspend: 1 Teb: bffa4000 Unfrozen
18 Id: 1748.1680 Suspend: 1 Teb: bffa3000 Unfrozen
19 Id: 1748.a7c Suspend: 1 Teb: bffa2000 Unfrozen
20 Id: 1748.1490 Suspend: 1 Teb: bff9e000 Unfrozen
. 21 Id: 1748.ab0 Suspend: 1 Teb: bff9d000 Unfrozen
22 Id: 1748.c70 Suspend: 1 Teb: bff9c000 Unfrozen
23 Id: 1748.170c Suspend: 1 Teb: bff9b000 Unfrozen
24 Id: 1748.1538 Suspend: 1 Teb: bff4a000 Unfrozen
25 Id: 1748.1470 Suspend: 1 Teb: bff47000 Unfrozen
26 Id: 1748.3c8 Suspend: 1 Teb: bff43000 Unfrozen
27 Id: 1748.ce8 Suspend: 1 Teb: bffa9000 Unfrozen
28 Id: 1748.16c4 Suspend: 1 Teb: bff99000 Unfrozen
29 Id: 1748.11e4 Suspend: 1 Teb: bff9a000 Unfrozen
30 Id: 1748.131c Suspend: 1 Teb: bffaa000 Unfrozen
31 Id: 1748.12ac Suspend: 1 Teb: bffdb000 Unfrozen
32 Id: 1748.bfc Suspend: 1 Teb: bffda000 Unfrozen
33 Id: 1748.1238 Suspend: 1 Teb: bffde000 Unfrozen
Or go to it directly with ~~[1490]s
Normally this thread will be executing some code inside the critical section protected by this critsec, but in this particular case, thread 20/1490 is a regular worker thread waiting for work
0:020> kb 2000
ChildEBP RetAddr Args to Child
020bfaf8 7c822124 77e6baa8 00000250 00000000 ntdll!KiFastSystemCallRet
020bfafc 77e6baa8 00000250 00000000 020bfb40 ntdll!NtWaitForSingleObject+0xc
020bfb6c 77e6ba12 00000250 00009c40 00000000 kernel32!WaitForSingleObjectEx+0xac
020bfb80 791d401f 00000250 00009c40 00000000 kernel32!WaitForSingleObject+0x12
020bfba4 791fdacc 00000000 79172262 79172270 mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x3a
020bffb8 77e66063 0012c938 00000000 00000000 mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
020bffec 00000000 791fda8b 0012c938 00000000 kernel32!BaseThreadStart+0x34
Since there is nothing on this thread that should be using a critical section in odbc32, most likely the critical section is either transitioning/moving to a different thread or orphaned, which means that it is lost forever because the thread that owned it got an access violation or similar while owning the critical section.
In a case like this there are two avenues. The first is to take two dumps after each other (with about 1-2 minutes in between) to see if the critical section is transitioning, if it is, we should see that the first thread is no longer waiting for the critical section in dump #2. The second avenue is to take crash dumps instead of hang dumps so that we record any access violations or similar that occur prior to the hang and can find out what caused us to orphan the critical section.
Just so you don’t lay sleepless over this:) I will tell you the cause of this particular orphaning… the dump itself was about 1.4 GB so the process had suffered a number of OutOfMemoryExceptions leaving threads in unstable states like this…
Happy hunting!