Getting Started with Windbg
I’m back from Oredev which turned out to be a really cool conference.
Meanwhile my colleague Johan wrote a very nice introduction to debugging with Windbg and sos, outlining some of the most commonly used commands etc. that is well worth reading if you are just starting up with post-mortem debugging.
EDIT (30 Sept 2020): The original post from Johan is no longer available, but since I think it is very useful content - I am replicating the post below - but the credit goes to Johan Straarup
Laters,
Tess
Okay, I’ve previously written some random posts about how to set up windbg and how to troubleshoot OutOfMemoryExceptions. I thought I’d take a few steps back and review some of the basics in order to help you get started on using this fantastic tool.
Basic Configuration
- Copy sos.dll from the framework directory to the folder where you installed windbg. Make sure you copy it from the same Framework version as the one you wish to investigate. If you’ll be working with both 1.1 and 2.0 you can rename the SOS.dlls to SOS11.dll and SOS20.dll or put them in separate folders.
- Create a folder where you want to cache all the symbol files. For example: “C:\Symbols”.
- Start windbg and open the dialogue to configure the symbol path by clicking File -> Symbol File Path.
- Enter the path, as well as the address from which you’ll want to download missing symbols using the following syntax:
srv*[cache path]*[symbols path]
I’d recommend the following path:srv*c:\symbols\public*http://msdl.microsoft.com/download/symbols
You should now be set to go. You’re now ready to open up a saved dump, or attach to a process
Useful commands
I’ll be using a dump from an IIS6-server to demonstrate some useful commands.
The first thing you’ll want to do is load SOS. You’ll do this using the .load
command. The syntax is simple. .load [extension filename]
So if you want to load sos and haven’t renamed the .dll you’d simply write:
.load sos
You’ll now have all the cool commands from the SOS-extension at your disposal as well as the default windbg ones. Commands from extensions are always preceded by a “!”, so if you want to run the help-command for sos you’d write
!help
If you should happen to have two extensions with an identically named command you can always separate them by typing ![extension name].[command]
Example:
!sos.help
Okay, now that we know how to run the commands from the extension, try running !help. It should give you the following result.
0:000> !help
-------------------------------------------------------------------------------
SOS is a debugger extension DLL designed to aid in the debugging of managed
programs. Functions are listed by category, then roughly in order of
importance. Shortcut names for popular functions are listed in parenthesis.
Type "!help " for detailed info on that function.
Object Inspection Examining code and stacks
----------------------------- -----------------------------
DumpObj (do) Threads
DumpArray (da) CLRStack
DumpStackObjects (dso) IP2MD
DumpHeap U
DumpVC DumpStack
GCRoot EEStack
ObjSize GCInfo
FinalizeQueue EHInfo
PrintException (pe) COMState
TraverseHeap BPMD
Examining CLR data structures Diagnostic Utilities
----------------------------- -----------------------------
DumpDomain VerifyHeap
EEHeap DumpLog
Name2EE FindAppDomain
SyncBlk SaveModule
DumpMT GCHandles
DumpClass GCHandleLeaks
DumpMD VMMap
Token2EE VMStat
EEVersion ProcInfo
DumpModule StopOnException (soe)
ThreadPool MinidumpMode
DumpAssembly
DumpMethodSig Other
DumpRuntimeTypes -----------------------------
DumpSig FAQ
RCWCleanupList
DumpIL
For more documentation on a specific command, type !help [name of command]
.time
This is not an SOS-command, which is evident by the command not beginning with a “!”. Running the .Time command will show you relevant info about the time, as well as system uptime, process uptime and the amount of time spent in kernel & user mode.
0:000> .time
Debug session time: Tue Oct 23 08:38:35.000 2007 (GMT+1)
System Uptime: 4 days 17:48:01.906
Process Uptime: 0 days 0:24:37.000
Kernel time: 0 days 0:04:23.000
User time: 0 days 0:03:28.000
As you can see the system has been up for over 4 days. The process has been running for 24½ minutes and has an accumulated CPU-time of about 8 minutes total. This would give us an average CPU-usage for the process of around 32,5%
!threadpool
We can then use the !threadpool
-command to find out exactly what the CPU-usage was at the time the dump was taken. We’ll also get some useful information like the number of work requests in the queue, completion port threads and timers.
0:000> !threadpool
CPU utilization 100%
Worker Thread: Total: 5 Running: 4 Idle: 1 MaxLimit: 200 MinLimit: 2
Work Request in Queue: 16
Unknown Function: 6a2d945d Context: 023ede30
Unknown Function: 6a2d945d Context: 023ee1e8
AsyncTimerCallbackCompletion TimerInfo@11b53760
Unknown Function: 6a2d945d Context: 023ee3a8
Unknown Function: 6a2d945d Context: 023e3040
Unknown Function: 6a2d945d Context: 023ee178
Unknown Function: 6a2d945d Context: 023edfb0
AsyncTimerCallbackCompletion TimerInfo@11b36428
AsyncTimerCallbackCompletion TimerInfo@11b53868
Unknown Function: 6a2d945d Context: 023ee060
Unknown Function: 6a2d945d Context: 023ee290
Unknown Function: 6a2d945d Context: 023eded0
Unknown Function: 6a2d945d Context: 023edd88
Unknown Function: 6a2d945d Context: 023ede98
Unknown Function: 6a2d945d Context: 023ee258
Unknown Function: 6a2d945d Context: 023edfe8
--------------------------------------
Number of Timers: 9
--------------------------------------
Completion Port Thread:Total: 3 Free: 3 MaxFree: 4 CurrentLimit: 2 MaxLimit: 200 MinLimit: 2
So we can see that currently we’re using 100% of the CPU, which leads us to the next command.
!runaway
This is a nice command that will list all running threads and their CPU-usage. It’s your best friend when troubleshooting a high CPU hang issue.
0:000> !runaway
User Mode Time
Thread Time
25:1a94 0 days 0:00:39.937
16:1bc0 0 days 0:00:38.390
50:1e8c 0 days 0:00:08.859
52:1e40 0 days 0:00:08.687
20:1c2c 0 days 0:00:08.234
51:1340 0 days 0:00:08.171
21:1bcc 0 days 0:00:06.953
26:13ec 0 days 0:00:06.671
44:131c 0 days 0:00:03.906
22:d8c 0 days 0:00:03.375
33:78c 0 days 0:00:02.656
34:1a8c 0 days 0:00:00.906
29:1f5c 0 days 0:00:00.828
6:e28 0 days 0:00:00.625
5:1c78 0 days 0:00:00.546
23:14a4 0 days 0:00:00.484
4:5ac 0 days 0:00:00.437
45:5dc 0 days 0:00:00.421
3:13b4 0 days 0:00:00.421
47:19c8 0 days 0:00:00.375
28:1b6c 0 days 0:00:00.250
46:1dac 0 days 0:00:00.156
7:1dd8 0 days 0:00:00.109
48:cdc 0 days 0:00:00.093
49:1eac 0 days 0:00:00.062
15:1a64 0 days 0:00:00.062
0:1804 0 days 0:00:00.046
36:4a4 0 days 0:00:00.031
11:1eb4 0 days 0:00:00.031
1:10b4 0 days 0:00:00.031
31:16ac 0 days 0:00:00.015
14:4ac 0 days 0:00:00.015
2:186c 0 days 0:00:00.015
59:590 0 days 0:00:00.000
58:294 0 days 0:00:00.000
57:16d0 0 days 0:00:00.000
56:1578 0 days 0:00:00.000
55:1428 0 days 0:00:00.000
54:16d8 0 days 0:00:00.000
53:fd8 0 days 0:00:00.000
43:1b8c 0 days 0:00:00.000
42:1c24 0 days 0:00:00.000
41:1e2c 0 days 0:00:00.000
40:11b0 0 days 0:00:00.000
39:edc 0 days 0:00:00.000
38:1a08 0 days 0:00:00.000
37:171c 0 days 0:00:00.000
35:1254 0 days 0:00:00.000
32:1f9c 0 days 0:00:00.000
30:1ae8 0 days 0:00:00.000
27:190c 0 days 0:00:00.000
24:1d2c 0 days 0:00:00.000
19:1e38 0 days 0:00:00.000
18:ee4 0 days 0:00:00.000
17:fb8 0 days 0:00:00.000
13:1b54 0 days 0:00:00.000
12:1a48 0 days 0:00:00.000
10:f64 0 days 0:00:00.000
9:1024 0 days 0:00:00.000
8:1b78 0 days 0:00:00.000
As you can see the total amount of time does not match the total CPU utilization time that we got from the .time command. That’s simply because threads get reused and recycled. This means that the total amount of CPU time used by a thread may have been split up over several page requests.
!threads
To get more information about the running threads we can run the !threads
-command. This will list all managed threads in the application, what application domain the thread is currently executing under, etc. The output will look like this:
0:000> !threads
ThreadCount: 48
UnstartedThread: 0
BackgroundThread: 29
PendingThread: 0
DeadThread: 19
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
16 1 1bc0 001fccd0 1808220 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
22 2 d8c 002016f0 b220 Enabled 00000000:00000000 0019daf0 0 MTA (Finalizer)
14 4 4ac 00242e58 880a220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
23 5 14a4 11b39f18 80a220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
24 6 1d2c 11b41ad8 1220 Enabled 00000000:00000000 0019daf0 0 Ukn
25 7 1a94 11b46c70 180b220 Enabled 27240c98:27241fd8 11b42540 1 MTA (Threadpool Worker)
26 9 13ec 12ce2888 200b220 Enabled 2a9f1434:2a9f33c0 11b42540 0 MTA
27 a 190c 12d85eb8 200b220 Enabled 00000000:00000000 11b42540 0 MTA
29 b 1f5c 13df6a50 200b220 Enabled 2ab1da6c:2ab1f1c0 11b42540 0 MTA
30 c 1ae8 12d44a58 b220 Enabled 00000000:00000000 11b42540 0 MTA
31 d 16ac 12e2e008 200b220 Enabled 2a81348c:2a8153c0 11b42540 1 MTA
5 e 1c78 12da2160 220 Enabled 00000000:00000000 0019daf0 0 Ukn
33 8 78c 11b674c8 200b220 Enabled 2707b818:2707c1d8 11b42540 0 MTA
34 12 1a8c 13f163c8 220 Enabled 00000000:00000000 0019daf0 0 Ukn
36 13 4a4 13eef718 200b220 Enabled 2a7db4a4:2a7dd3c0 11b42540 0 MTA
4 14 5ac 13ef2008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
42 10 1c24 13f0e950 880b220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
6 11 e28 13f16008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
3 f 13b4 13eba008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
43 15 1b8c 140db008 880b220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
44 17 131c 140ceb28 200b220 Enabled 272288c8:27229fd8 11b42540 0 MTA
45 1d 5dc 140cd0a0 220 Enabled 00000000:00000000 0019daf0 0 Ukn
47 20 19c8 1651a008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
XXXX 24 0 16468880 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
46 1f 1dac 1650ab48 220 Enabled 00000000:00000000 0019daf0 0 Ukn
XXXX 1a 0 140d5008 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 16 0 140c5008 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
50 3 1e8c 14064420 180b220 Enabled 27246f54:27247fd8 11b42540 1 MTA (Threadpool Worker)
XXXX 35 0 1406e800 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
51 36 1340 140df008 180b220 Enabled 2adec9cc:2aded1c0 11b42540 1 MTA (Threadpool Worker)
XXXX 37 0 16566868 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
48 38 cdc 16578840 220 Enabled 00000000:00000000 0019daf0 0 Ukn
XXXX 39 0 16566c28 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 3b 0 1646b8b0 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 3c 0 16674008 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 3d 0 16676418 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 3e 0 16676fb8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 3f 0 16674d48 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 40 0 1667de10 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 41 0 16680050 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 42 0 166812e8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 43 0 16683e60 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
52 44 1e40 165259e8 180b220 Enabled 2adf126c:2adf31c0 11b42540 1 MTA (Threadpool Worker)
XXXX 45 0 165b7c08 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 46 0 165aa3d8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 47 0 165242c8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
XXXX 48 0 165e9500 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
49 3a 1eac 165676f0 220 Enabled 00000000:00000000 0019daf0 0 Ukn
The threads with an ID of XXXX have ended and are waiting to be recycled. We can also see that the finalizer thread has an ID of 22. So if we’d seen an unhealthy amount of activity on thread 22 when we ran the !runaway-command we would now have known that we had a finalizer-issue on our hands.
Switching to a specific thread
To go to a specific thread we use the ~-command
. The syntax is as follows: ~[number of thread]s
. So to switch to thread 50 we would type the following:
0:000> ~50s
We have then switched to thread 50, and can use a lot of other useful commands.
!clrstack
This great command will list the call stack for the current thread. If you want additional information you can add the -p
switch which will show you parameters and local variables as well.
Below is a sample listing of the clrstack for thread 50.
0:050> !clrstack
OS Thread Id: 0x1e8c (50)
ESP EIP
17a9e750 7d61c828 [NDirectMethodFrameSlim: 17a9e750] System.DirectoryServices.Protocols.Wldap32.ldap_bind_s(IntPtr, System.String, System.DirectoryServices.Protocols.SEC_WINNT_AUTH_IDENTITY_EX, System.DirectoryServices.Protocols.BindMethod)
17a9e768 14df70f9 System.DirectoryServices.Protocols.LdapConnection.BindHelper(System.Net.NetworkCredential, Boolean)
17a9e794 14df6de0 System.DirectoryServices.Protocols.LdapConnection.Bind()
17a9e79c 14df59e9 System.DirectoryServices.Protocols.LdapConnection.SendRequestHelper(System.DirectoryServices.Protocols.DirectoryRequest, Int32 ByRef)
17a9e8b8 14df56e8 System.DirectoryServices.Protocols.LdapConnection.SendRequest(System.DirectoryServices.Protocols.DirectoryRequest, System.TimeSpan)
17a9e8bc 14df5657 [InlinedCallFrame: 17a9e8bc]
So, reading from the bottom up we can see that an LdapConnection called the SendRequest method, which in turn called the SendRequestHelper method, which called the Bind method, and so on.
If we run !clrstack -p
we get the following:
0:050> !clrstack -p
OS Thread Id: 0x1e8c (50)
ESP EIP
17a9e750 7d61c828 [NDirectMethodFrameSlim: 17a9e750] System.DirectoryServices.Protocols.Wldap32.ldap_bind_s(IntPtr, System.String, System.DirectoryServices.Protocols.SEC_WINNT_AUTH_IDENTITY_EX, System.DirectoryServices.Protocols.BindMethod)
17a9e768 14df70f9 System.DirectoryServices.Protocols.LdapConnection.BindHelper(System.Net.NetworkCredential, Boolean)
PARAMETERS:
this = 0x271fdfe0
newCredential =
needSetCredential =
17a9e794 14df6de0 System.DirectoryServices.Protocols.LdapConnection.Bind()
PARAMETERS:
this =
17a9e79c 14df59e9 System.DirectoryServices.Protocols.LdapConnection.SendRequestHelper(System.DirectoryServices.Protocols.DirectoryRequest, Int32 ByRef)
PARAMETERS:
this = 0x271fdfe0
request = 0x27246e38
messageID = 0x17a9e8ec
17a9e8b8 14df56e8 System.DirectoryServices.Protocols.LdapConnection.SendRequest(System.DirectoryServices.Protocols.DirectoryRequest, System.TimeSpan)
PARAMETERS:
this = 0x271fdfe0
request = 0x27246e38
requestTimeout =
17a9e8bc 14df5657 [InlinedCallFrame: 17a9e8bc]
We can now look at the parameters, like the DirectoryRequest that was sent to the SendRequest and SendRequestHelper methods. To do this we simply copy the address of the request, (0x27246e38) and use it as an argument for our next command.
!dumpobject (!do)
This is another crucial command. !dumpobject
will dump the object at the specified address, so if we send the address of the request as a parameter we will get the request dumped to screen.:
0:050> !do 0x27246e38
Name: System.DirectoryServices.Protocols.SearchRequest
MethodTable: 14b394c4
EEClass: 14d97ce0
Size: 52(0x34) bytes
GC Generation: 0
(C:\WINDOWS\assembly\GAC_MSIL\System.DirectoryServices.Protocols\2.0.0.0__b03f5f7f11d50a3a\System.DirectoryServices.Protocols.dll)
Fields:
MT Field Offset Type VT Attr Value Name
02c39310 4000102 4 System.String 0 instance 00000000 directoryRequestID
14b398bc 4000103 8 ...ControlCollection 0 instance 27246e90 directoryControlCollection
02c39310 4000111 c System.String 0 instance 27246d00 dn
12579f5c 4000112 10 ....StringCollection 0 instance 27246eb4 directoryAttributes
02c36ca0 4000113 14 System.Object 0 instance 27246ddc directoryFilter
14b39344 4000114 18 System.Int32 1 instance 1 directoryScope
14b393fc 4000115 1c System.Int32 1 instance 0 directoryRefAlias
0fd3da00 4000116 20 System.Int32 1 instance 0 directorySizeLimit
1202af88 4000117 28 System.TimeSpan 1 instance 27246e60 directoryTimeLimit
120261c8 4000118 24 System.Boolean 1 instance 0 directoryTypesOnly
Okay, so what’s this? Well, it’s a System.DirectoryServices.Protocols.SearchRequest object. This means that it has various properties defined by the System.DirectoryServices.Protocols.SearchRequest class. If you want to know more about these properties I suggest you look up the SearchRequest class in msdn. We have the RequestId, the Scope, the DistinguishedName, etc.
So, let’s say we want to know what the distinguished name is for this particular request. We boldly assume that the dn-property we see in the listing above is what is called DistinguishedName by MSDN. Simply copy the address of the dn-property (27246d00) and use !dumpobject
again. We can see that the property is a System.String, so the output should be pretty clear.
0:050> !do 27246d00
Name: System.String
MethodTable: 02c39310
EEClass: 0fb610ac
Size: 112(0x70) bytes
GC Generation: 0
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: CN=Dummy,CN=Accounts,CN=useradm,DC=dummy,DC=net
Fields:
MT Field Offset Type VT Attr Value Name
0fd3da00 4000096 4 System.Int32 1 instance 48 m_arrayLength
0fd3da00 4000097 8 System.Int32 1 instance 47 m_stringLength
0fb80010 4000098 c System.Char 1 instance 43 m_firstChar
02c39310 4000099 10 System.String 0 shared static Empty
>> Domain:Value 0019daf0:03380310 11b42540:03380310 <<
0fb86d44 400009a 14 System.Char[] 0 shared static WhitespaceChars
>> Domain:Value 0019daf0:03380324 11b42540:033855bc <<
Apparently, the distinguished name used was “CN=Dummy,CN=Accounts,CN=useradm,DC=dummy,DC=net”. If we want to find out more we simply continue using !dumpobject
to examine the objects and their respective values.
In my next post I thought I’d continue using the !dumpobject
-command to probe through the w3wp-process, and introduce a couple of other great commands.
To be continued…
/ Johan