ASP.NET Case Study: Hang on WaitOne, WaitAny or WaitMultiple
One of the synchronization methods in .NET is the ResetEvent
. It comes in two flavors, the AutoResetEvent
which resets itself immediately after it is set, and the ManualResetEvent
which as the name suggests you have to manually reset.
Lets say you have a team of developers that can implement different parts of an application simultaneously without interaction, then the work order might look something like this
- Ask Bob to implement X
- Ask Belinda to implement Y
- Ask Ben to implement Z
- Integrate X, Y and Z when you get a notification that they are done with their work
In code (using a reset event) this would look something like this
ImplementApp(){
ImplementX(); //spawns off implementation of X on another thread, signals when ready
ImplementY(); //spawns off implementation of Y on another thread, signals when ready
ImplementZ(); //spawns off implementation of Z on another thread, signals when ready
WaitHandle.WaitAll(autoEvents);
IntegrateXYandZ(); //uses the results of the Imlement methods
}
The ImplementX, Y and Z methods would then use QueueUserWorkItem
to get the work scheduled to other threads and when done they would do autoEvents[i].Set()
to signal that they are ready.
When you call a web service for example, internally it will spawn up a thread that sits and waits for the results from the web service call and when it is done the original thread will be signalled and can continue with its work. You can see an example of how this looks here.
Another common use for the AutoResetEvent
and ManualResetEvent
is to spawn a thread that just sits around for the lifetime of the process waiting for certain events to happen and act on them when they occur. If you look at a dump you will often see threads sitting in WaitOne
waiting for some event to happen like this one:
ESP EIP
0x0109fb74 0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0109fb88 0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0109fbbc 0x01040fcf [DEFAULT] Void System.EnterpriseServices.ServicedComponentProxy.QueueCleaner()
0x0109fdc4 0x791b3208 [FRAME: GCFrame]
This is perfectly normal, the QueueCleaner
here just sits there waiting for someone to signal that the Queue needs cleaning so it isn’t hanging by any means, it is just waiting on an event.
Going back to the initial example, what would happen if Ben quit work without telling anyone, before he is done with his implementation of Z? In real-life you would probably be worried if he didn’t come to work for a few days and assign the work to someone else, but in an application no one would be the wiser and the app would be hung, waiting indefinitely for WaitHandle.WaitAll(autoEvents)
.
Debugging the issue
For demo purposes I have implemented the Calculate example show in the MSDN help files for AutoResetEvent but added a little bit of a twist to it (as you’ll see later) so my application hung. I then grabbed a hang dump with adplus -hang -pn w3wp.exe
, loaded up sos and ran ~* e !clrstack
.
Most threads were sitting in this stack
OS Thread Id: 0x1e58 (26)
ESP EIP
0f2cefb0 7d61d051 [HelperMethodFrame_1OBJ: 0f2cefb0] System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle[], Int32, Boolean, Boolean)
0f2cf07c 7940332b System.Threading.WaitHandle.WaitAll(System.Threading.WaitHandle[], Int32, Boolean)
0f2cf098 0f1005a1 Calculate.Result(Int32)
0f2cf0a8 0f10034d _Default.Page_Load(System.Object, System.EventArgs)
0f2cf0d8 66f12980 System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0f2cf0e8 6628efd2 System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0f2cf0f8 6613cb04 System.Web.UI.Control.OnLoad(System.EventArgs)
0f2cf108 6613cb50 System.Web.UI.Control.LoadRecursive()
0f2cf11c 6614e12d System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0f2cf318 6614d8c3 System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0f2cf350 6614d80f System.Web.UI.Page.ProcessRequest()
0f2cf388 6614d72f System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0f2cf390 6614d6c2 System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0f2cf3a4 0f100125 ASP.default_aspx.ProcessRequest(System.Web.HttpContext)
0f2cf3a8 65fe6bfb System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0f2cf3dc 65fe3f51 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0f2cf41c 65fe7733 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0f2cf46c 65fccbfe System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0f2cf488 65fd19c5 System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0f2cf4bc 65fd16b2 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0f2cf4c8 65fcfa6d System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0f2cf6d8 79f047fd [ContextTransitionFrame: 0f2cf6d8]
0f2cf70c 79f047fd [GCFrame: 0f2cf70c]
0f2cf868 79f047fd [ComMethodFrame: 0f2cf868]
So here we can see that we are in _Default.Page_Load
, calling Calculate.Result
, and this is sitting in a WaitAll
waiting for someone to signal a resetEvent
Here is an excerpt from the code, and we are stuck on the bolded line:
public double Result(int seed)
{
randomGenerator = new Random(seed);
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateBase));
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateFirstTerm));
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateSecondTerm));
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculateThirdTerm));
WaitHandle.WaitAll(autoEvents);
manualEvent.Reset();
return firstTerm + secondTerm + thirdTerm;
}
...
void CalculateThirdTerm(object stateInfo)
{
double preCalc = randomGenerator.NextDouble();
manualEvent.WaitOne();
try
{
thirdTerm = GetTerm(preCalc);
autoEvents[2].Set();
}
catch { }
}
For some reason one of the the autoEvents has not been signaled. If it was the fact that we were still working on the calculation in CalculateThirdTerm for example, then we would have seen a thread in ~* e !clrstack
that was stuck somewhere in CalculateThirdTerm
. This was not the case here which means that the thread must have exited without setting the event, much like our teammate Ben.
From the code we can see that one way this could happen would be if some exception occurred in GetTerm such that we exit the try block without setting the event.
Knowing this I dump out all the recent exceptions in the dump using this command
.foreach (ex {!dumpheap -type Exception -short}){!pe ${ex}}
This goes through all objects on the heap named Exception and runs !pe (print exception) on them.
Note: If you do this, don’t be alarmed if you see an OutOfMemoryException, a StackOverflowException and an ExecutionEngineException. They will always be there since the exception objects for these exceptions are created on startup since you can’t create them when you throw them.
With the command above I find a number of these exceptions
Exception object: 06f392b4
Exception type: System.ArgumentException
Message: Value can't be less than 1.0
InnerException:
StackTrace (generated):
SP IP Function
0F34F19C 0F1006DE App_Code_klmxs0si!Calculate.GetTerm(Double)+0x6e
0F34F1B4 0F10079B App_Code_klmxs0si!Calculate.CalculateThirdTerm(System.Object)+0x33
So this validates the theory that an exception occurred in GetTerm and this in turn cause us to not signal the event and finally block on the WaitAll
Final thoughts
If you use a synchronization method, whether it be a Monitor
, ReaderWriterLock
or a ResetEvent
, you need to make sure that independently of what happens you will release the lock or signal the event as it may be. With a WaitOne
, WaitAny
or WaitAll
there is an option to provide a timeout in which case the Wait will finish when the timeout is reached and WaitOne, WaitAny or WaitAll will return false so that you can check to see if it timed out.
With a lock(){}
statement you will never orphan the lock, the monitor that is used internally will exit even if an exception occurs in the lock statement. This is similar to using the using(){}
statement instead of disposing manually.
If you are manually using Monitor.Enter
and Monitor.Exit
, or if you use AcquireReaderLock
or AcquireWriterLock
you should release it in a finally block to avoid orphaning it if you throw exceptions.
Laters, Tess