Following my post about Tuning the ThreadPool
a week or two ago, I've been load testing a new web service that my team have been working on. This service is at the root of our communications to third parties and as such requires optimum performance.
To reduce response times we use the CLR ThreadPool to send multiple requests to the provider asynchronously. So one request to our web service results in many more workitems being added to the .NET ThreadPool. Fortunately, we were aware of the problems with low thread counts and made sure we had our recommended settings
However, we were still unhappy with the performance. As part of the tests we stubbed out the actual part of the system that communicates with third parties (as the response time can vary wildly and make the results harder to understand). The stub just sleeps for 4 seconds. So within reason, and provided you weren't exhausting the ThreadPool, you would expect the response to be just over 4 seconds
Here's the chart from one of our load tests (red
= user load, green
= requests/sec (x10) and blue
= Average response time in seconds):
The average response time was 10.4 seconds, with a maximum of 19.8. The load in the example (we tried many different types and shapes of load) starts at 4 concurrent users and adds another 4 users every 45 seconds. The test lasted 8 minutes.
Clearly, these results are unsatisfactory so I set about finding the source of the problem. First I checked that we weren't exhausting the CLR ThreadPool by logging the number of available Worker threads.
No, we weren't even making a dent in the available threads.
So I started analysing each request using my TraceProfiler (hopefully more on this soon). The profiler found something weird; at times the original request thread was just sitting there, waiting for up to 12 seconds for work on the ThreadPool to even start. We were confident that it wasn't the 'new thread' problem I discussed in the previous post
, so what is going on?
The ThreadPool is a complicated beast tuned for use by ASP.NET and the like, but kindly released by Microsoft into the wild for us mortal developers to use too.
It apparently does all kind of clever things like monitoring CPU and making decisions about whether to start the next queued work item or make it wait to keep the number of active threads low. This stops a busy CPU wasting its time by having to context switch repeatedly. I could only speculate that this was somehow the cause of our problem.
To prove the assertion we decided to implement a custom ThreadPool that's a bit more slapdash about things and just processes all the work it's given. After all, our workitems aren't CPU intensive - they mostly just sit waiting for a response from the remote server.
We tried Jon Skeet's CustomThreadPool available in his MiscUtil library
. We set the min and max threads to match those in our CLR threadpool and here are the results:
The average response time was 4.5 seconds. A ten-fold improvement in performance if you discount the 4 second sleep time. We also monitored the CPU levels to make sure that the new ThreadPool wasn't giving the processor(s) an impossible time, but levels were consistent with those observed during the CLR ThreadPool's tests.
Have a look at the combined chart
(opens in a new window).
What an improvement! Remember, the only difference between the two systems being loaded is a change in threadpool for some custom threading work, that's it. Just look at the difference in throughput (requests/sec in green) with no degredation in response time.
I'm not saying that Jon's ThreadPool is better than the .NET equivalent (and I'm certainly not saying that Jon's implementation is 'slapdash' either!) but in this case it worked for us.
We were manually using the ThreadPool in this case (through QueueUserWorkItem). Even so, the .NET ThreadPool settings for contention and burst load are as important as ever - remember this pool is still used by async web service calls, delegates, callbacks and ASP.NET itself.
Lots of people have written at great length about the suitability of certain types of tasks for the .NET ThreadPool and a few writers have suggested that it doesn't work well with long running tasks. I struggle with this rule because I'm not clear just how long a long running task is? 20 seconds? 4 seconds? 1 second?
In future, I'll continue to abstract the use of the ThreadPool and try different pools during load test.