ASP.NET 2.0 Crash case study: Unhandled exceptions
For a long time all my case studies have been on 1.1. it’s time to venture out in 2.0 land and look at what may seem like a 2.0 specific issue.
I say “may seem” because this case study will only directly crash if you are using 2.0, but as you’ll learn later the problem existed in 1.1 and 1.0, it was just way harder to track down.
Problem description
Once in a while ASP.NET crashes and we see events in the system event log like this one
Event Type: Warning
Event Source: W3SVC
Event Category: None
Event ID: 1009
Date: 2006-04-25
Time: 09:41:22
PM User: N/A
Computer: SUBSPACE1
Description:
A process serving application pool 'ASP.NET V2.0' terminated unexpectedly. The process id was ‘1732’. The process exit code was ‘0xe0434f4d’.
Or this one
Event Type: Warning
Event Source: W3SVC
Event Category: None
Event ID: 1011
Date: 2006-04-25
Time: 09:41:22
User: N/A
Computer: SUBSPACE1
Description:
A process serving application pool 'ASP.NET V2.0' suffered a fatal communication error with the World Wide Web Publishing Service. The process id was '6256'. The data field contains the error number.
And in the application event log we get a pretty cryptic error message like this one
Event Type: Error
Event Source: .NET Runtime 2.0 Error Reporting
Event Category: None
Event ID: 5000
Date: 2006-04-25
Time: 09:41:20
User: N/A
Computer: SUBSPACE1
Description:
EventType clr20r3, P1 w3wp.exe, P2 6.0.3790.1830, P3 42435be1, P4 app_code.pn5mfdcr, P5 0.0.0.0, P6 444dcf44, P7 5, P8 5, P9 system.dividebyzeroexception, P10 NIL.
Initial thoughts
Ok, so what do we know about the issue? We know that asp.net terminated unexpectedly, and that right before this we got a System.DivideByZeroException. We also know that the process exit code was 0xe0434f4d whatever that means, hmm…
Usually when you get a stopped unexpectedly error message the exit code will be the type of exception that caused the crash. For example a 0xC0000005 means you got a second chance access violation, 0x800703e9 means you suffered a StackOverflowException but what about 0xe0434f4d?
0xe0434f4d is the exception code for CLR (.net) exceptions, so any managed exception like a NullReferenceException or InvalidOperationException or SQLException, basically all managed exception are natively referred to as 0xe0434f4d. In this case, if we look closer at the application event log entry we can see that it is in fact a System.DivideByZero exception.
Trivia: Just a piece of info of no particular value that you might want to pull out of pocket on your next date 0xe0434f4d or at least 43 4f 4d are the ASCII values for the letters COM.
But hey now… should a .net exception cause the asp.net process to crash??? If you divide by zero in your page and don’t have a try catch block around it, surely you will get one of those “nice” white and yellow error pages saying that an exception occurred on your page, but the process doesn’t just exit.
The answer is yes, you will get one of those pages because the asp.net global error handler will eventually catch your exception, format it for you and print it out on the screen. But what happens if it is not on an asp.net request, so there is no-one to feedback the exception to? It’s the old paradox: If a tree falls in the forest and nobody is there, does it still make a sound?
In 1.0 and 1.1 it didn’t. For example if you throw an exception in a piece of code called on a timer, or use QueueUserWorkItem and throw an exception in code executing there, or otherwise throw exceptions in code that is not running inside the context of an asp.net request, the framework will swallow the exception and continue. Or rather it will stop that thread of execution but it won’t die.
Doesn’t sound all that bad right? Really???
That thread could have been doing anything, and we will never be the wiser that it died. It could have been holding a lock of some sort, or it could have been in the middle of cleaning up resources, or really a number of different things that will now never happen, but that may immediately or eventually have really bad side effects like hangs or crashes or memory issues, but the exception will be long gone so we cant figure out what it was.
The policy for unhandled exceptions was changed in ASP.NET 2.0 to the default for .net which is a process exit. This can be changed back by adding the following to the aspnet.config in the frameworks directory, but I wouldn’t recommend it without putting in some preventive measures to take care of potential unhandled exceptions on non ASP.NET threads.
<configuration>
<runtime>
<legacyUnhandledExceptionPolicy enabled="true" />
</runtime>
</configuration>
Troubleshooting the issue
The main task here is to find out where this DivideByZeroException is coming from and why it occurred so there are two ways to figure this out (short of complete code inspection).
Strategy #1 – logging the exception
The first way, and this is the way I would probably recommend, is to create an UnhandledExceptionHandler to log the exception along with its stack trace in the event log as shown in this KB article
You add the handler like this to the web.config:
<system.web>
<httpModules>
<add type="WebMonitor.UnhandledExceptionModule, <strong name>" name="UnhandledExceptionModule"/>
</httpModules>
...
</system.web>
And it hooks an event handler up to the UnhandledException event of the current app domain.
You don’t actually need to strong name it and add it to the GAC, however if you plan it in multiple applications you should to avoid for the dll being loaded multiple times.
Now the next time you get one of these unhandled exceptions, the process will still exit (unless you change the unhandled exception policy), but you have a very good chance of fixing the issue.
The event for the exception in this particular sample looks like this…
Event Type: Error
Event Source: ASP.NET 2.0.50727.0
Event Category: None
Event ID: 0
Date: 2006-04-25
Time: 09:41:20
User: N/A
Computer: SUBSPACE1
Description:
UnhandledException logged by UnhandledExceptionModule.dll:
appId=/LM/w3svc/1/ROOT/CrashMe
type=System.DivideByZeroException
message=Attempted to divide by zero.
stack=
at MyFinalizerClass.Finalize()
.
Bingo!!! So the exception occurs in MyFinalizerClass.Finalize() in the CrashMe application.
In fact the code for the finalizer for this class looks like this, so it is pretty obvious what caused it, and our work here is done…
~MyFinalizerClass()
{
int i = 0;
int j = 9;
i = j / i;
}
Setting up an UnhandledException handler like this is not limited to 2.0. You can absolutely do this in 1.1 as well to determine if you are throwing any unhandled exceptions. The only modification you need to do to the code outlined in the article is to choose a different dll to gather the version from. A prime candidate for this would be mscorlib.dll.
Strategy #2 – debugging with windbg
If you have read any of my previous posts you know I have a special place in my heart for debugging. If you don’t have an unhandled exception event handler you can still find out what the exception was by running adplus in crash mode to generate dumps when the process exits.
In this particular case doing this will result in the following files being generated.
C:\debuggers\Crash_Mode__Date_04-25-2006__Time_10-11-3030>dir
Volume in drive C has no label.
Volume Serial Number is 30D7-F806
Directory of C:\debuggers\Crash_Mode__Date_04-25-2006__Time_10-11-3030
2006-04-25 10:12 <DIR> .
2006-04-25 10:12 <DIR> ..
2006-04-25 10:11 4 404 ADPlus_report.txt
2006-04-25 10:11 <DIR> CDBScripts
2006-04-25 10:11 9 420 059 PID-3368__W3WP.EXE__1st_chance_IntegerDivide__mini_12D8_2006-04-25_10-11-51-487_0D28.dmp
2006-04-25 10:12 108 142 158 PID-3368__W3WP.EXE__1st_chance_Process_Shut_Down__full_12D8_2006-04-25_10-12-02-440_0D28.dmp
2006-04-25 10:12 108 294 606 PID-3368__W3WP.EXE__2nd_chance_NET_CLR__full_12D8_2006-04-25_10-11-54-690_0D28.dmp
2006-04-25 10:12 35 324 PID-3368__W3WP.EXE__Date_04-25-2006__Time_10-11-3030.log
2006-04-25 10:11 9 014 Process_List.txt
6 File(s) 225 905 565 bytes
3 Dir(s) 4 592 435 200 bytes free
So at 10-11-51 a 1st chance IntegerDivide exception occurred, followed by a 2nd chance .NET exception (2nd chance meaning that it wasn’t handled), and finally this was followed by a process shutdown.
Concentrating on the 2nd chance exception dump and loading up sos.dll from the framework directory
0:020> .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll
We can run !threads to give us a list of the threads along with info about which thread the exception occurred on (in this case thread 20, the finalizer thread).
0:020> !threads
ThreadCount: 7
UnstartedThread: 0
BackgroundThread: 7
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
16 1 c20 000b9778 3808220 Enabled 00000000:00000000 000fddc0 1 Ukn (Threadpool Worker)
20 2 16d8 000e3bd0 b220 Enabled 0242ab08:0242bf94 000caad8 0 MTA (Finalizer) System.DivideByZeroException (02428f60)
21 3 b2c 000fa588 80a220 Enabled 00000000:00000000 000caad8 0 MTA (Threadpool Completion Port)
22 4 16c0 000fd690 1220 Enabled 00000000:00000000 000caad8 0 Ukn
14 6 1570 0014a970 880a220 Enabled 00000000:00000000 000caad8 0 MTA (Threadpool Completion Port)
23 7 1b44 00145178 180b220 Enabled 00000000:00000000 000caad8 0 MTA (Threadpool Worker)
24 5 bac 0014a5b0 880b220 Enabled 00000000:00000000 000caad8 0 MTA (Threadpool Completion Port)
So from this we know that we are looking at an exception happening during finalization.
Since this dump was triggered by the exception this will also be the active thread when you load up the dump.
If we look at the native and managed callstack it doesn’t really bring us any closer to the solution…
0:020> kb 2000
ChildEBP RetAddr Args to Child
01eaf9fc 79f97065 e0434f4d 00000001 00000001 kernel32!RaiseException+0x53
01eafa5c 7a05b941 02428f60 00000000 00000000 mscorwks!RaiseTheExceptionInternalOnly+0x226
01eafa70 7a05b9b2 02428f60 00000000 01eafb88 mscorwks!RaiseTheException+0x4c
01eafa98 7a05b9ea 00000000 000e3bd0 4914714b mscorwks!RaiseTheException+0xbe
01eafac4 7a05b9f5 02428f60 00000000 7a0e0701 mscorwks!RealCOMPlusThrow+0x37
01eafad0 7a0e0701 02428f60 7740e5f5 7a36a738 mscorwks!RealCOMPlusThrow+0xa
01eafc00 7a0e16e4 00000000 01eafc38 01eafca8 mscorwks!Thread::RaiseCrossContextException+0x63
01eafcb4 79f3a09c 000fddc0 79f3a0a1 01eafeb4 mscorwks!Thread::DoADCallBack+0x25a
01eafcd0 79f3581e 01eafeb4 79f3ab82 000fddc0 mscorwks!Thread::UserResumeThread+0xf3
01eafcd8 79f3ab82 000fddc0 79f3abb0 01eafd08 mscorwks!ManagedThreadBase::FinalizerAppDomain+0x26
01eafd34 79f350ea 000e3bd0 00000000 01eafd5b mscorwks!SVR::CreateGCHeap+0x163
01eafdb0 79f3500e 00000000 00000000 01eafe6c mscorwks!SVR::CreateGCHeap+0x204
01eafdc8 79ecb4a4 01eafeb4 00000000 00000001 mscorwks!SVR::GCHeap::FinalizerThreadWorker+0xe7
01eafdd8 79ecb442 01eafeb4 01eafe60 79f93fe6 mscorwks!Thread::UserResumeThread+0xfb
01eafe6c 79ecb364 01eafeb4 7740e0c9 00000000 mscorwks!Thread::DoADCallBack+0x355
01eafea8 79ed5e8b 01eafeb4 00000000 000fddc0 mscorwks!Thread::DoADCallBack+0x541
01eafed0 79ed5e56 79f34f4a 00000008 79f6fd87 mscorwks!ManagedThreadBase_NoADTransition+0x32
01eafedc 79f6fd87 79f34f4a 7740e175 00000000 mscorwks!ManagedThreadBase::FinalizerBase+0xb
01eaff14 79ecb00b 00000000 00000000 00000000 mscorwks!SVR::GCHeap::FinalizerThreadStart+0xbb
01eaffb8 77e66063 000e4390 00000000 00000000 mscorwks!Thread::intermediateThreadProc+0x49
01eaffec 00000000 79ecafc5 000e4390 00000000 kernel32!BaseThreadStart+0x34
0:020> !clrstack
OS Thread Id: 0x16d8 (20)
ESP EIP
01eafaac 77e55dea [GCFrame: 01eafaac]
01eafb14 77e55dea [GCFrame: 01eafb14]
01eafd14 77e55dea [GCFrame: 01eafd14]
But from the !threads output we get the address of the exception (02428f60) and we can take a deeper look…
0:020> !do 02428f60
Name: System.DivideByZeroException
MethodTable: 7915e020
EEClass: 791e97f8
Size: 72(0x48) bytes
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
MT Field Offset Type VT Attr Value Name
790fa3e0 40000b5 4 System.String 0 instance 02429e78 _className
79109208 40000b6 8 ...ection.MethodBase 0 instance 00000000 _exceptionMethod
790fa3e0 40000b7 c System.String 0 instance 0242a18c _exceptionMethodString
790fa3e0 40000b8 10 System.String 0 instance 02429fb4 _message
79113dfc 40000b9 14 ...tions.IDictionary 0 instance 00000000 _data
790fa9e8 40000ba 18 System.Exception 0 instance 00000000 _innerException
790fa3e0 40000bb 1c System.String 0 instance 00000000 _helpURL
790f9c18 40000bc 20 System.Object 0 instance 00000000 _stackTrace
790fa3e0 40000bd 24 System.String 0 instance 00000000 _stackTraceString
790fa3e0 40000be 28 System.String 0 instance 0242aab0 _remoteStackTraceString
790fed1c 40000bf 34 System.Int32 0 instance 0 _remoteStackIndex
790f9c18 40000c0 2c System.Object 0 instance 00000000 _dynamicMethods
790fed1c 40000c1 38 System.Int32 0 instance -2147352558 _HResult
790fa3e0 40000c2 30 System.String 0 instance 0242a3c4 _source
790fe160 40000c3 3c System.IntPtr 0 instance 0 _xptrs
790fed1c 40000c4 40 System.Int32 0 instance 0 _xcode
In particular we can dump out the _remoteStackTraceString and find out where exactly the error occurred.
0:020> !do 0242aab0
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 88(0x58) bytes
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: at MyFinalizerClass.Finalize()
Fields:
MT Field Offset Type VT Attr Value Name
790fed1c 4000096 4 System.Int32 0 instance 36 m_arrayLength
790fed1c 4000097 8 System.Int32 0 instance 35 m_stringLength
790fbefc 4000098 c System.Char 0 instance 20 m_firstChar
790fa3e0 4000099 10 System.String 0 shared static Empty
>> Domain:Value 000caad8:790d6584 000fddc0:790d6584 <<
79124670 400009a 14 System.Char[] 0 shared static WhitespaceChars
>> Domain:Value 000caad8:023d03f0 000fddc0:023d4374 <<
Voila, there we have it again… our MyFinalizerClass.Finalize()
If you want to take away just one thing from this post, it should be to be careful with the code running on non ASP.NET threads, making sure that you have proper try / catch blocks around any code that could cause an exception.
Until next time…