Thursday 21 August 2008

How to Troubleshoot IIS6 'Server Too Busy' Error


Background:

“Server Too Busy” means HTTP status code 500 and that means, something is wrong with Web server rather website.

I am not going to discuss dying (when IIS7 is knocking the door) product in too depth. But please look intently at this diagram for 5 minutes before you read further.

Ohhh O, it is not 5 minutes yet!!!!

Ok so it seems that potentially there can be queuing at following places because,

1) HTTP.SYS is the first handler in kernel mode to validate the incoming request and the route the request to appropriate kernel mode queue.

2) Worker Process it self because of multiple AppDomains (in simple terms, it is multiple application inside same worker process roughly).

3) Application it self with I/O Thread pool.

Let’s examine the properties associated with each queue.

Mode

IIS Queues

Symptoms

Property

Location

Default

Kernel Mode

HTTP.sys

HTTP Error log indicates 503 – Queue Full

AppPoolQueueLength

Metabase

IIS 6 - 1000







Kernel Mode to User Mode transition

W3WP or Thread Pool

HTTP Error log indicates Timer Connection Idle

Please Ignore it for a while because it is a big post in it self!!!!







User Mode: Global

ASP.Net ISAPI Handler or Application Queue

HTTP 503 – Server Too Busy Error

AppRequestQueueLimit

Machine.config in .Net configuration Folder under <HTTPRunTime>

.Net 1.0 or 1.1 it is 100

.Net 2.0 or higher it is 5000







User Mode: Application

.Net Thread Pool

See the Details below.

MaxWorkerThreads

MinWorkerThreads

MaxIoThreads

MinFreeThreads

MinLocalRequestFreeThreads

Maxconnection

ExecutionTimeout

Machine.config under the element <ProcessModel> affects all applications using that .Net version.[1]

OR

In web.config for application specific

.Net 2.0 it is set to auto config so generally do not require modification.

.Net 1.0 or 1.1 it set to very minimal number and always require modification so scale up.

Symptoms For User Mode:Global Contention:

1. Event ID: 1003 – aspnet_isapi.dll reported itself as unhealthy for the following reason: ‘Deadlock Detected’

2. Event ID: 1013 – A Process serving application pool ‘udaypandya.com’ exceeded time limits during shut down.

3. “System.InvalidOperationException: There were not enough free threads in the ThreadPool object to complete the operation.”

4. In most extreme situation it gives “HttpException (0x80004005): Request timed out.”

Real world problem:

Client created a support ticket mentioning that server displays “Server Too Busy” error. I have enabled the performance monitoring and got the following output. Customer mentioned that application is using .Net version 2.0. This error is random and can not be reproduced on demand.

I went ahead and checked the HTTPErr log and got the following output:

Logparser Query: logparser -i:HTTPERR -o:DATAGRID "SELECT date, time, c-ip, s-ip, cs-uri, cs-method, sc-status, s-siteid, s-reason, s-queuename from C:\IISLogs\HTTPERR\HTTPERR\httperr42.log WHERE sc-status>500"

Here is a screenshot for Performance Monitor.

Can some one please help me out with what is wrong and how should I troubleshoot it further?

Uday Pandya


Windows 2008 and Terminal Server Stream Disconnected



Problem: DELL Server with Broadcom NetXtreme Gigabit Ethernet card server running Windows 2008. If you connect to server via RDP, you loose the connection with an error unexpectedly. You will see something like this on client side:

The terminal Server has ended the connection.

On the server you will see something like this:

Solution:

As mentioned in event log there is a problem with RDP security layer. There is a very nice support article as well on Microsoft. Please have a look at following link:

http://support.microsoft.com/kb/323497

Nice, simple and easy Happy Working but there is one problem!!! This article is the most closet article you can find for the problem description and it does not solve your problem.

I have dumped the RDP packets with Netmon 3.1 and I do not see any session termination from the server side. After doing lot of research (and to save that lot of time of yours), I found the problem with Broadcom NetXtreme Gigabit Ethernet adapter.

By default in our kick on Windows 2008, we have enabled advanced features such as IPv4 Checksum Offload, IPv4 Large Send Offload (LSO) and Receive Side Scaling (RSS). I strongly recommend you to visit following link for more information:

http://technet.microsoft.com/en-us/network/bb545631.aspx

Large Send Offloading divides the packet into small chunks and creation of TCP packet happens on the network card. It turned out that issue is due to LSO feature enabled and TS session service detected a problem with data stream. In order to stabilize the RDP session, we need to disable the LSO offloading from Broadcom Advanced Control Suite as follow:


Uday Pandya

.Net Service Pack Information

Quick reminder about .Net version number associated with service pack.

For Version 1.0
------------
1.0.3705.0-Original RTM
1.0.3705.209-SP1
1.0.3705.288-SP2
1.0.3705.6018-SP3


For Version 1.1
--------------
1.1.4322.573-Original RTM
1.1.4322.2032-SP1
1.1.4322.2300-SP1 32 Bit(Included Win Server 2003)

For Version 2.0
-----------

2.0.50727.42-Original RTM

I hope it helps while troubleshooting ASP.Net performance related problems.

Uday Pandya

Windows Defragmentation Explained!!!!

To understand defragmentation, we need to understand how Windows uses hard drive space. When we format a hard disk, hard disk is divided into sectors of 512bytes of data. To use disk I/O and space efficiently, Windows group sectors into Clusters. Cluster is a group of sectors. Cluster is the smallest unit of space available for allocation. NTFS determines the cluster size as follow (KB 314878):

Drive size

(logical volume)

Cluster size

Sectors

512 MB or less

512 bytes

1

513 MB - 1,024 MB (1 GB)

1,024 bytes (1 KB)

2

1,025 MB - 2,048 MB(2GB)

2,048 bytes (2 KB)

4

2,049 MB and larger

4,096 bytes (4 KB)

8

Over the time hard disk gets fragmented that means single file is not stored in continuous clusters. Problem is mechanical component of disk needs to do some overwork and hard disk cache as well as windows disk cache can not do read-ahead caching algorithms. Accessing cache is always faster than disk sought. This interns into performance hit and in general disk defragment is recommended. Apart from performance hit, from Rackspace Point of view, we recommend defrag in every case either sluggish server response, managed backup failure and what not (I have seen recommendation for defrag on Rackwatch tickets, don’t laugh you have seen this as well. don’t you?)!!!!

Fun apart, let’s see what defragmentation process does and its limitation as well J Defragmentation utility rearranges the files so that they are stored in physically contiguous clusters. Along with used sectors, defrag process will consolidate free space so that new files will not be defragmented when they created.

Let’s see the limitation:

1) Disk defragmentation can not defragment Recycle Bin. For efficient defrag, always empty the Recycle Bin.

2) Disk defragmenter can not touch page file unless it is zeroed out. Use PageDefrag from SysInternal when it is absolutely necessary. On a high performance server, do not leave page file to grow automatically. This makes page file to be fragmented and performance hit when initializing new page file space.

3) Disk defragmenter will not defragment files that are in use. For best results shutdown all running programs. There was a debate in past whether to shutdown SQL or not before defragment. Before recommending to customer for SQL shutdown, see point 4.

4) Disk defragment will not defrag files greater than 16,000 contiguous clusters (~64 MB on volume greater than 2 GB) because it had negligible performance improvement by default. It is possible to pin down those files and defrag. It is safe to assume that fragmented file greater than 64 MB fragments is not fragmented as far as disk caching and Windows caching goes.

Before suggesting disk defragmentation, answer following questions:

1) Is the process I/O bound? If so how many files in general it refers. Is it in 10s, 100s or 1000s. For better performance it is ideal that files are not defragmented but to make significant improvement in performance it has to be in 1000s.

2) Is process capable of High Speed I/O? (Separate post to come). In general Microsoft Office product is capable of High speed I/O. If Process uses High Speed I/O, it makes very little improvement after defragmentation.

3) Analyze the volume and check the following things?

No.

Fragments

File Size

Most Fragmented Files

a

2,586

614 MB

\SysBkUp\SystemState.bkf

b

21

1600 MB

\ProgramFiles\DebugDiag\Logs\PerfLogs\PerfLog_Date__05_06_2008__Time_09_05_16PM__161.blg

a. How Frequently Fragmented is used? E.g. in (a), fragmented file is system state backup and it is the first in Most Fragmented Files list. Are we troubleshooting the “long system state backup time” related problem?

b. Divide File Size with Fragments e.g. in (b) 1600 / 21 = 76 MB avg. fragmented unit size. Would defragmentation be able to defrag by default and if so would it make any performance improvement in terms of Disk I/O? This is exactly the case with SQL related files. If you see fragment unit is greater than 64 MB, there is very negligible performance hit and defrag with or without stopping SQL will not make noticeable improvement.

4) Do not trust Volume fragmentation report. Make reasonable guess from most fragmented files list.

In following circumstances, please proceed with defrag:

5) Is your MFT fragmented? If “Total MFT Fragments” are greater than 5, proceed with defrag. Check the analysis report and see the section MFT fragmentation:

Master File Table (MFT) fragmentation

Total MFT size = 208 MB
MFT record count = 124,340
Percent MFT in use = 58 %
Total MFT fragments = 2

6) If “Most Fragmented Files” List contains your website files and they are not cached files (e.g. .aspx and .asp are cached most of the time along with some static files like .jpg and .gif). There are very few circumstances when fragmentation is the root cause for IIS performance and most of them are related to new file creation via upload.

Learning Curve:

While troubleshooting performance problem, consider this option as last rather first. Think twice whether you really need to defrag the volume to solve the problem?

Suggestions and comments are welcome!!!

Uday Pandya