Thursday 21 August 2008

How to Troubleshoot IIS6 'Server Too Busy' Error


Background:

“Server Too Busy” means HTTP status code 500 and that means, something is wrong with Web server rather website.

I am not going to discuss dying (when IIS7 is knocking the door) product in too depth. But please look intently at this diagram for 5 minutes before you read further.

Ohhh O, it is not 5 minutes yet!!!!

Ok so it seems that potentially there can be queuing at following places because,

1) HTTP.SYS is the first handler in kernel mode to validate the incoming request and the route the request to appropriate kernel mode queue.

2) Worker Process it self because of multiple AppDomains (in simple terms, it is multiple application inside same worker process roughly).

3) Application it self with I/O Thread pool.

Let’s examine the properties associated with each queue.

Mode

IIS Queues

Symptoms

Property

Location

Default

Kernel Mode

HTTP.sys

HTTP Error log indicates 503 – Queue Full

AppPoolQueueLength

Metabase

IIS 6 - 1000







Kernel Mode to User Mode transition

W3WP or Thread Pool

HTTP Error log indicates Timer Connection Idle

Please Ignore it for a while because it is a big post in it self!!!!







User Mode: Global

ASP.Net ISAPI Handler or Application Queue

HTTP 503 – Server Too Busy Error

AppRequestQueueLimit

Machine.config in .Net configuration Folder under <HTTPRunTime>

.Net 1.0 or 1.1 it is 100

.Net 2.0 or higher it is 5000







User Mode: Application

.Net Thread Pool

See the Details below.

MaxWorkerThreads

MinWorkerThreads

MaxIoThreads

MinFreeThreads

MinLocalRequestFreeThreads

Maxconnection

ExecutionTimeout

Machine.config under the element <ProcessModel> affects all applications using that .Net version.[1]

OR

In web.config for application specific

.Net 2.0 it is set to auto config so generally do not require modification.

.Net 1.0 or 1.1 it set to very minimal number and always require modification so scale up.

Symptoms For User Mode:Global Contention:

1. Event ID: 1003 – aspnet_isapi.dll reported itself as unhealthy for the following reason: ‘Deadlock Detected’

2. Event ID: 1013 – A Process serving application pool ‘udaypandya.com’ exceeded time limits during shut down.

3. “System.InvalidOperationException: There were not enough free threads in the ThreadPool object to complete the operation.”

4. In most extreme situation it gives “HttpException (0x80004005): Request timed out.”

Real world problem:

Client created a support ticket mentioning that server displays “Server Too Busy” error. I have enabled the performance monitoring and got the following output. Customer mentioned that application is using .Net version 2.0. This error is random and can not be reproduced on demand.

I went ahead and checked the HTTPErr log and got the following output:

Logparser Query: logparser -i:HTTPERR -o:DATAGRID "SELECT date, time, c-ip, s-ip, cs-uri, cs-method, sc-status, s-siteid, s-reason, s-queuename from C:\IISLogs\HTTPERR\HTTPERR\httperr42.log WHERE sc-status>500"

Here is a screenshot for Performance Monitor.

Can some one please help me out with what is wrong and how should I troubleshoot it further?

Uday Pandya


Windows 2008 and Terminal Server Stream Disconnected



Problem: DELL Server with Broadcom NetXtreme Gigabit Ethernet card server running Windows 2008. If you connect to server via RDP, you loose the connection with an error unexpectedly. You will see something like this on client side:

The terminal Server has ended the connection.

On the server you will see something like this:

Solution:

As mentioned in event log there is a problem with RDP security layer. There is a very nice support article as well on Microsoft. Please have a look at following link:

http://support.microsoft.com/kb/323497

Nice, simple and easy Happy Working but there is one problem!!! This article is the most closet article you can find for the problem description and it does not solve your problem.

I have dumped the RDP packets with Netmon 3.1 and I do not see any session termination from the server side. After doing lot of research (and to save that lot of time of yours), I found the problem with Broadcom NetXtreme Gigabit Ethernet adapter.

By default in our kick on Windows 2008, we have enabled advanced features such as IPv4 Checksum Offload, IPv4 Large Send Offload (LSO) and Receive Side Scaling (RSS). I strongly recommend you to visit following link for more information:

http://technet.microsoft.com/en-us/network/bb545631.aspx

Large Send Offloading divides the packet into small chunks and creation of TCP packet happens on the network card. It turned out that issue is due to LSO feature enabled and TS session service detected a problem with data stream. In order to stabilize the RDP session, we need to disable the LSO offloading from Broadcom Advanced Control Suite as follow:


Uday Pandya

.Net Service Pack Information

Quick reminder about .Net version number associated with service pack.

For Version 1.0
------------
1.0.3705.0-Original RTM
1.0.3705.209-SP1
1.0.3705.288-SP2
1.0.3705.6018-SP3


For Version 1.1
--------------
1.1.4322.573-Original RTM
1.1.4322.2032-SP1
1.1.4322.2300-SP1 32 Bit(Included Win Server 2003)

For Version 2.0
-----------

2.0.50727.42-Original RTM

I hope it helps while troubleshooting ASP.Net performance related problems.

Uday Pandya

Windows Defragmentation Explained!!!!

To understand defragmentation, we need to understand how Windows uses hard drive space. When we format a hard disk, hard disk is divided into sectors of 512bytes of data. To use disk I/O and space efficiently, Windows group sectors into Clusters. Cluster is a group of sectors. Cluster is the smallest unit of space available for allocation. NTFS determines the cluster size as follow (KB 314878):

Drive size

(logical volume)

Cluster size

Sectors

512 MB or less

512 bytes

1

513 MB - 1,024 MB (1 GB)

1,024 bytes (1 KB)

2

1,025 MB - 2,048 MB(2GB)

2,048 bytes (2 KB)

4

2,049 MB and larger

4,096 bytes (4 KB)

8

Over the time hard disk gets fragmented that means single file is not stored in continuous clusters. Problem is mechanical component of disk needs to do some overwork and hard disk cache as well as windows disk cache can not do read-ahead caching algorithms. Accessing cache is always faster than disk sought. This interns into performance hit and in general disk defragment is recommended. Apart from performance hit, from Rackspace Point of view, we recommend defrag in every case either sluggish server response, managed backup failure and what not (I have seen recommendation for defrag on Rackwatch tickets, don’t laugh you have seen this as well. don’t you?)!!!!

Fun apart, let’s see what defragmentation process does and its limitation as well J Defragmentation utility rearranges the files so that they are stored in physically contiguous clusters. Along with used sectors, defrag process will consolidate free space so that new files will not be defragmented when they created.

Let’s see the limitation:

1) Disk defragmentation can not defragment Recycle Bin. For efficient defrag, always empty the Recycle Bin.

2) Disk defragmenter can not touch page file unless it is zeroed out. Use PageDefrag from SysInternal when it is absolutely necessary. On a high performance server, do not leave page file to grow automatically. This makes page file to be fragmented and performance hit when initializing new page file space.

3) Disk defragmenter will not defragment files that are in use. For best results shutdown all running programs. There was a debate in past whether to shutdown SQL or not before defragment. Before recommending to customer for SQL shutdown, see point 4.

4) Disk defragment will not defrag files greater than 16,000 contiguous clusters (~64 MB on volume greater than 2 GB) because it had negligible performance improvement by default. It is possible to pin down those files and defrag. It is safe to assume that fragmented file greater than 64 MB fragments is not fragmented as far as disk caching and Windows caching goes.

Before suggesting disk defragmentation, answer following questions:

1) Is the process I/O bound? If so how many files in general it refers. Is it in 10s, 100s or 1000s. For better performance it is ideal that files are not defragmented but to make significant improvement in performance it has to be in 1000s.

2) Is process capable of High Speed I/O? (Separate post to come). In general Microsoft Office product is capable of High speed I/O. If Process uses High Speed I/O, it makes very little improvement after defragmentation.

3) Analyze the volume and check the following things?

No.

Fragments

File Size

Most Fragmented Files

a

2,586

614 MB

\SysBkUp\SystemState.bkf

b

21

1600 MB

\ProgramFiles\DebugDiag\Logs\PerfLogs\PerfLog_Date__05_06_2008__Time_09_05_16PM__161.blg

a. How Frequently Fragmented is used? E.g. in (a), fragmented file is system state backup and it is the first in Most Fragmented Files list. Are we troubleshooting the “long system state backup time” related problem?

b. Divide File Size with Fragments e.g. in (b) 1600 / 21 = 76 MB avg. fragmented unit size. Would defragmentation be able to defrag by default and if so would it make any performance improvement in terms of Disk I/O? This is exactly the case with SQL related files. If you see fragment unit is greater than 64 MB, there is very negligible performance hit and defrag with or without stopping SQL will not make noticeable improvement.

4) Do not trust Volume fragmentation report. Make reasonable guess from most fragmented files list.

In following circumstances, please proceed with defrag:

5) Is your MFT fragmented? If “Total MFT Fragments” are greater than 5, proceed with defrag. Check the analysis report and see the section MFT fragmentation:

Master File Table (MFT) fragmentation

Total MFT size = 208 MB
MFT record count = 124,340
Percent MFT in use = 58 %
Total MFT fragments = 2

6) If “Most Fragmented Files” List contains your website files and they are not cached files (e.g. .aspx and .asp are cached most of the time along with some static files like .jpg and .gif). There are very few circumstances when fragmentation is the root cause for IIS performance and most of them are related to new file creation via upload.

Learning Curve:

While troubleshooting performance problem, consider this option as last rather first. Think twice whether you really need to defrag the volume to solve the problem?

Suggestions and comments are welcome!!!

Uday Pandya

Monday 16 June 2008

Case of Unexplained, IIS FTP Problem….

Scenario: IIS FTP is configured in “Isolate User (without Active Directory)” mode as documented in following Microsoft Support Article:

http://support.microsoft.com/kb/555018

Problem: After reinstalling and reconfiguring all IIS related services many times, IIS FTP was not working in isolated mode. Every time when user tries to login into the system via FTP, it fails with following error:

Event Type: Warning

Event Source: MSFTPSVC

Event Category: None

Event ID: 13

Date: 01/06/2008

Time: 10:20:38

User: N/A

Computer: IWISHICOULDCHAN

Description:

User administrator failed to log on, could not access the home directory /.

I have decided to dive into the problem and I came to conclusion that IIS or IIS FTP is not the source problem. Inetinfo.exe is working as expected but could not able to find the home directory for specific users. I verified many time that directory with user’s username exists and have necessary permission for Inetinfo.exe process.

I have decided to launch ProcessMonitor and created a filter for Ineinfo.exe to investigate what was going on and I got the following activity.

Sequence: 12145

Date Time:08:59:25.0538032

Process Name: inetinfo.exe

PID: 1172

Operation: CreateFile

Path: C:\Inetpub\ftproot\LocalUser\IWISHICOULDCHAN\Administrator\

Result: PATH NOT FOUND

Detail: Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Open For Backup, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, Impersonating: IWISHICOULDCHAN\Administrator

I was stumped that inetinfo.exe tries to read the user’s home directory at above mentioned location. According to Microsoft’s support article, you need to create user’s home directory in “LocalUser” folder but here it seems that is not the case. I went ahead and created user directory as desired and everything started to work.

Investigation:

Microsoft Support articles mentioned following things about how to recreate FTP root-print and user folders:

If users of the local computer log in with their individual account user names, create the subdirectories LocalUser under the FTP site root directory you specified when you creating the FTP site.

For Example -

FTP root directory -> D:\MyFTP\

LocalUser locate at D:\MyFTP\LocalUser

For individual user, you need to create folder in this format - LocalUser\username

User: Susan locate at D:\MyFTP\LocalUser\Susan

If users of different domains log on with their explicit domain\username credentials, create a subdirectory for each domain (by using the name of the domain) under the FTP site root directory you specified when you creating the FTP site.

For Example -

Domain Name: Account

FTP root directory -> D:\MyFTP\

LocalUser locate at D:\MyFTP\Account

For individual domain user, you need to create folder in this format - Domain\username

Domain User: Nancy locate at D:\MyFTP\Account\Nancy[1]

It appeared that FTP problem I had follows the pattern of root print folder creation in domain environment but slight problem was I was running a server in Workgroup environment.

Inetinfo.exe was trying to read the file at “FTPRootDirectory\ServerName\%username%”. Careful look revealed that Server name was not actual name but it was server’s NetBIOS name. My full computer name was “IWISHICOULDCHANGE” longer than 15 characters (I have changed it for clarity) and hence NetBIOS name was “IWISHICOULDCHAN”.

Technical Details:

It seems that when you try to login via FTP, Inetinfo.exe process tries to read computer name. Please see the following event in ProcessMonitor:

Sequence: 11436

Date Time: 09:34:38.3419286

Process Name: inetinfo.exe

PID: 1172

Operation: RegQueryValue

Path:HKLM\System\CurrentControlSet\Control\ComputerName\ActiveComputerName\ComputerName

Result: SUCCESS

Data: IWISHICOULDCHAN

As we can see, Inetinfo.exe reads the computer name from registry and it returns only NetBIOS name of server.

I believe the original problem is how the authentication works and it gets transitioned. IIS FTP users Windows Authentication by default and relies on NTLM. When you login, IIS FTP impersonates the logging user to access the file or directory. ProcessMonitor shows following event:

Sequence: 7265

Date Time: 09:58:57.8573348

Process Name: inetinfo.exe

PID: 1172

Operation: CreateFile

Path C:\Inetpub\ftproot\LocalUser\IWISHICOULDCHAN\Administrator

Result: SUCCESS

Data: Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Open For Backup, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, Impersonating: IWISHICOULDCHAN\Administrator, OpenResult: Opened

And you will notice following event in Security Event Log:

Event Type: Success Audit

Event Source: Security

Event Category: Logon/Logoff

Event ID: 552

Date: 01/06/2008

Time: 11:22:50

User: NT AUTHORITY\SYSTEM

Computer: IWISHICOULDCHAN

Description:

Logon attempt using explicit credentials:

Logged on user:

User Name: IWISHICOULDCHAN$

Domain: WORKGROUP

Logon ID: (0x0,0x3E7)

Logon GUID: -

User whose credentials were used:

Target User Name: Administrator

Target Domain: IWISHICOULDCHAN

Target Logon GUID: -

Target Server Name: localhost

Target Server Info: localhost

Caller Process ID: 1172

Source Network Address: -

Source Port: -

It seems that while doing NTLM authentication when server name is longer than 15 characters, Inetinfo.exe process find that Isolated FTP site is running in domain environment and hence tries to excess the user’s home directory at “FTPRootDirectory\ServerName\%username%”.

Conclusion:

It seems that if you have a computer name longer than 15 characters and you are trying to create Isolated FTP site, you need to create user’s home directory in “FTPRootDirectory\NetBIOS Name of Server\” and not in “FTPRootDirectory\” as documented in support article. I haven’t found any support article mentioning this or any article pointing that server hosting IIS should not have name longer than 15 characters yet.

Side Problem:

When you change the computer name longer than 15 characters, Windows give you an Information Dialog box as follow but it is not enough with the problems you might have in future:

This saga continues with one more problems you have while creating a website.

My server has following IP configuration.

When computer name is longer than 15 characters, IIS Manager shows me only primary IP address as available IP address to configure website. Please see the following output:

Technically, I should see “10.2.105.107” and “10.2.105.108” as available IP address. To verify whether the problem is with IIS manager GUI or IIS it self, I modified server binding in “Metabase.xml” as follow:

Location ="/LM/W3SVC/1"

AppPoolId="DefaultAppPool"

DefaultDoc="Default.htm,Default.asp,index.htm,iisstart.htm"

LogPluginClsid="{FF160663-DE82-11CF-BC0A-00AA006111E0}"

ServerBindings="10.2.105.107:80:

10.2.105.108:80:"

ServerComment="Default Web Site"

ServerSize="1"

>

So it seems that IIS Manager is now aware of additional IPs available on the server (at least forced by us!!). I went ahead and start creating new website and stumped again!!!

IIS manager do not show me additional IP. It only shows me primary IP address of the server. I have verified that the problem is related to GUI and not IIS itself because when you create a new website manually (via command line) and create sever bindings to user only secondary IP address, IIS functions normally. I believe this happens because of how drop down dialog box gets populated!!



[1] http://support.microsoft.com/kb/555018