.NET Debugging Demos Lab 5: Crash
Last week I published a debugging challenge for Lab 5. It was really interesting to see the results and I have to say I was really happy to see the excellent results from the people who commented on the debugging challenge (sounds like my work here is done :)).
Quick Poll, for the next one (a memory leak), do you want a debugging challenge or do you want the lab steps at once?
As usual it is using the Buggy Bits site, and this time we are dealing with a crash.
Problem description
We get lot’s of customer reports that randomly our site will hang for a while, then display an browser cannot display the webpage page.
The page suggests the following as likely causes
- You are not connected to the Internet.
- The website is encountering problems.
- There might be a typing error in the address.
If they refresh the page the site works just fine but the fact that the page is displayed give the customers very little confidence in our site, especially if the error is displayed when they get ready to order items. We’re loosing millions of dollars because of this issue so it is extremely urgent that we resolve it. March is usually the peak-season for our site so we definitely need the site to work smoothly by then.
Previous labs and setup instructions
If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.
- Information and setup instructions
- Lab 1: Hang
- Lab 2: Crash
- Lab 3: Memory
- Lab 4: High CPU hang
- Lab 5: Debugging Challenge
Reproduce the issue
- Start the application and browse to the Company Information page
-
Type some message and click the Send button
- What do you see in the browser? or what is the experience?
Explore the event logs
-
Open the Windows System and Application event logs (event viewer)
- What events do you notice relating to the crash? (Note: you may see different events on different operating systems)
- What does Exception code
0xc00000fd
mean? - Based on the event logs, what caused the crash?
Reproduce the issue again and gather a memory dump
Note: If the process is stuck, you can simply get a hang dump with procdump -ma iisexpress.exe
as we are stopped at the exception. But if you want to get a dump at the exception, follow the steps below
- Kill the iisexpress.exe process in task manager
- Start the application and browse to Company Information but do not click send
- Set up a debugger to capture a dump on crash
- Option 1: Follow the instructions here to capture a dump on crash with the
COMPlus_DbgEnableMiniDump=1
environment variable - Option 2: Open up Debug Diag 2 Collection and Add a crash rule to capture a dump for iisexpress.exe on
C00000FD
- Option 1: Follow the instructions here to capture a dump on crash with the
- Add a message on the Company Information page and click send, and wait for the dump to be captured
Open the dump to figure out what it is doing
- Open the memory dump in WinDbg
- Load up the symbols and sos
What can cause a crash?
One of a few things can cause the process to shut down.
- An unhandled exception (2nd chance exception), in most cases if this is causing the shutdown, you will also get a 2nd chance exception dump, and this will be the one you want to look at to figure out what caused the crash
-
An external shutdown, i.e. an iisreset, a preemptive recycle (based on recycling options) or someone killing the process from task manager. If this is the case, the active thread in the process shutdown dump is usually the main thread and you will see a stack similar to the following
0:000> kL ChildEBP RetAddr 0014fbfc 7d503faf ntdll!ZwTerminateProcess+0x12 0014fc38 7d503f5a kernel32!_ExitProcess+0x4b 0014fc4c 79fd9e8f kernel32!ExitProcess+0x14 0014fe74 79f7479c mscorwks!SafeExitProcess+0x157 0014ff10 79004fab mscorwks!HandleExitProcessHelper+0x27 0014ff10 79004fab mscoree!CorExitProcess+0x46 0014ff20 77bcaddb mscoree!CorExitProcess+0x46 0014ff2c 77bcaefb msvcrt!__crtExitProcess+0x29 0014ff5c 77bcaf52 msvcrt!doexit+0x81 0014ff70 01001a3c msvcrt!exit+0x11 0014ffc0 7d4e7d2a w3wp!wmainCRTStartup+0x144 0014fff0 00000000 kernel32!BaseProcessStart+0x28
If this is the case you should look at the event log as it usually contains the recycling reason, and if you are troubleshooting a real crash you should consider disabling recycling while troubleshooting the real crash so that you won’t get these type of “red herring” dumps.
- Something in the process called process exit. This is usually caused by some type of exception considered fatal, i.e. heap corruption, StackOverflow, OutOfMemoryExceptions or FatalExecutionEngineExceptions. In this case the active thread in the process shutdown stack will look (and probably is) completely unrelated to the crash. In this case you will want to look at the log to see what happened just prior to the crash in order to figure out how to proceed and gather new dumps if necessary.
Examine the threads
Note: In a dump file the active thread is the thread that caused the debugger to dump the process, i.e. a process exit exception or other exception.
-
Run
kb 200
and!clrstack
to see what the active thread is doing- Can you tell what the issue is from here?
- Fix the problem
Have fun, Tess