Thursday, 8 January 2009

Demystifying .Net CLR Perfromance Counter

Hi Friends!!!


I am back in business and apologies for out of contact for a while. Lot of exciting thing happened over past few months. Finally I landed where I always wanted to be so thanks for your support so far and yes your guess is right.


I was waiting to right my first post with some exciting information. Exciting means hidden to the world and internet search engines could not find! In my usual style here it goes:


Scenario:


I was at customer site and my task was to monitor the IIS System health and that obliviously means monitoring the application health. I decided to setup performance counter for a start and I opened the Process counter as shown in following figure:




So server was hosting multiple web application and sever had multiple worker processes. In the Instance list all worker process was appearing as w3wp, w3wp#1, w3wp#2 and w3wp#n and so on. Problem was which one is associated with correct application pool?


Resolution:


I executed the following command to get a list of worker processes running on the server and associated process ID.


C:\windows\system32\cscript c:\windows\system32\iisapp.vbs

Output was something like below:


Then I remembered the one blog post from very dear friend of mine:

http://blogs.technet.com/marcelofartura/archive/2006/09/14/perfmon-s-counters-output-format-tip.aspx


I applied the registry changes as said and voila!!! I got the output as below:



Job done! Hum... I am sure you are wondering what is the point of having this blog. Such a waste of time isn't it? If you are thinking this than wait...

I opened .Net CLR Counters and specifically .Net CLR Exceptions and .Net CLR Memory and I see the following:



Hum!! Something didn't work as expected. I verified the registry entries and appeared correct to me. I came to conclusion that this registry modifications are not working for .NET CLR counters. Question was how to monitor .Net CLR counters for specific applications only?

I decided that it is time to call the internal help line and guess what within hours I got reply and guess what reply was such that I decided to write this blog.

Guy told me to look for under ".Net CLR Memory" Performance object and asked me "what is the 4th last counter you see?" I was amazed what it has to do with 4th last counter? I looked the counter and it was full of surprise!!! It seemed that "Promoted Finalization Memory from Gen1" is the holly grail!!!! The last value of that counter was actually the worker process ID!! Now you got the relationship between .Net CLR performance counters and Worker process ID as well.

Takeaway:


When you have .Net 1.1 installed it loads the .Net 1.1 performance counters and there is no easy way to get this relationship established. However, in .Net 2.0 onwards you can unload the .Net 1.1 counters and reload the .Net 2.0 specific counters with following instructions:

1) Go to \Windows\Microsoft.NET\Framework\v2.0.50727


2) Run "unlodctr .NetFramework"


3) Run "lodctr corperfmonsymbols.ini"


This should load the .Net 2.0 counters from corperfmonsymbols.ini. So what is the difference? Well, see the screenshot below:


Now you have the real name of performance counter as well along with real value. Demystifying the secret! Though you have counter listed, it will not work with .Net 1.1.

Well this solution provided as it is and you are at your own risk! Please don't take it as an official supported way to implement and it is kind a working solution. We should expect a lot of improvement on this side with .Net 4.0 so wait and watch!


PS: Special Thanks to Mr. A. Kamath who assisted me with this problem and by the way location of this blog will be moved very soon to where it deserves the place!!! Again you guessed correct!
















Thursday, 21 August 2008

How to Troubleshoot IIS6 'Server Too Busy' Error


Background:

“Server Too Busy” means HTTP status code 500 and that means, something is wrong with Web server rather website.

I am not going to discuss dying (when IIS7 is knocking the door) product in too depth. But please look intently at this diagram for 5 minutes before you read further.

Ohhh O, it is not 5 minutes yet!!!!

Ok so it seems that potentially there can be queuing at following places because,

1) HTTP.SYS is the first handler in kernel mode to validate the incoming request and the route the request to appropriate kernel mode queue.

2) Worker Process it self because of multiple AppDomains (in simple terms, it is multiple application inside same worker process roughly).

3) Application it self with I/O Thread pool.

Let’s examine the properties associated with each queue.

Mode

IIS Queues

Symptoms

Property

Location

Default

Kernel Mode

HTTP.sys

HTTP Error log indicates 503 – Queue Full

AppPoolQueueLength

Metabase

IIS 6 - 1000







Kernel Mode to User Mode transition

W3WP or Thread Pool

HTTP Error log indicates Timer Connection Idle

Please Ignore it for a while because it is a big post in it self!!!!







User Mode: Global

ASP.Net ISAPI Handler or Application Queue

HTTP 503 – Server Too Busy Error

AppRequestQueueLimit

Machine.config in .Net configuration Folder under <HTTPRunTime>

.Net 1.0 or 1.1 it is 100

.Net 2.0 or higher it is 5000







User Mode: Application

.Net Thread Pool

See the Details below.

MaxWorkerThreads

MinWorkerThreads

MaxIoThreads

MinFreeThreads

MinLocalRequestFreeThreads

Maxconnection

ExecutionTimeout

Machine.config under the element <ProcessModel> affects all applications using that .Net version.[1]

OR

In web.config for application specific

.Net 2.0 it is set to auto config so generally do not require modification.

.Net 1.0 or 1.1 it set to very minimal number and always require modification so scale up.

Symptoms For User Mode:Global Contention:

1. Event ID: 1003 – aspnet_isapi.dll reported itself as unhealthy for the following reason: ‘Deadlock Detected’

2. Event ID: 1013 – A Process serving application pool ‘udaypandya.com’ exceeded time limits during shut down.

3. “System.InvalidOperationException: There were not enough free threads in the ThreadPool object to complete the operation.”

4. In most extreme situation it gives “HttpException (0x80004005): Request timed out.”

Real world problem:

Client created a support ticket mentioning that server displays “Server Too Busy” error. I have enabled the performance monitoring and got the following output. Customer mentioned that application is using .Net version 2.0. This error is random and can not be reproduced on demand.

I went ahead and checked the HTTPErr log and got the following output:

Logparser Query: logparser -i:HTTPERR -o:DATAGRID "SELECT date, time, c-ip, s-ip, cs-uri, cs-method, sc-status, s-siteid, s-reason, s-queuename from C:\IISLogs\HTTPERR\HTTPERR\httperr42.log WHERE sc-status>500"

Here is a screenshot for Performance Monitor.

Can some one please help me out with what is wrong and how should I troubleshoot it further?

Uday Pandya


Windows 2008 and Terminal Server Stream Disconnected



Problem: DELL Server with Broadcom NetXtreme Gigabit Ethernet card server running Windows 2008. If you connect to server via RDP, you loose the connection with an error unexpectedly. You will see something like this on client side:

The terminal Server has ended the connection.

On the server you will see something like this:

Solution:

As mentioned in event log there is a problem with RDP security layer. There is a very nice support article as well on Microsoft. Please have a look at following link:

http://support.microsoft.com/kb/323497

Nice, simple and easy Happy Working but there is one problem!!! This article is the most closet article you can find for the problem description and it does not solve your problem.

I have dumped the RDP packets with Netmon 3.1 and I do not see any session termination from the server side. After doing lot of research (and to save that lot of time of yours), I found the problem with Broadcom NetXtreme Gigabit Ethernet adapter.

By default in our kick on Windows 2008, we have enabled advanced features such as IPv4 Checksum Offload, IPv4 Large Send Offload (LSO) and Receive Side Scaling (RSS). I strongly recommend you to visit following link for more information:

http://technet.microsoft.com/en-us/network/bb545631.aspx

Large Send Offloading divides the packet into small chunks and creation of TCP packet happens on the network card. It turned out that issue is due to LSO feature enabled and TS session service detected a problem with data stream. In order to stabilize the RDP session, we need to disable the LSO offloading from Broadcom Advanced Control Suite as follow:


Uday Pandya

.Net Service Pack Information

Quick reminder about .Net version number associated with service pack.

For Version 1.0
------------
1.0.3705.0-Original RTM
1.0.3705.209-SP1
1.0.3705.288-SP2
1.0.3705.6018-SP3


For Version 1.1
--------------
1.1.4322.573-Original RTM
1.1.4322.2032-SP1
1.1.4322.2300-SP1 32 Bit(Included Win Server 2003)

For Version 2.0
-----------

2.0.50727.42-Original RTM

I hope it helps while troubleshooting ASP.Net performance related problems.

Uday Pandya

Windows Defragmentation Explained!!!!

To understand defragmentation, we need to understand how Windows uses hard drive space. When we format a hard disk, hard disk is divided into sectors of 512bytes of data. To use disk I/O and space efficiently, Windows group sectors into Clusters. Cluster is a group of sectors. Cluster is the smallest unit of space available for allocation. NTFS determines the cluster size as follow (KB 314878):

Drive size

(logical volume)

Cluster size

Sectors

512 MB or less

512 bytes

1

513 MB - 1,024 MB (1 GB)

1,024 bytes (1 KB)

2

1,025 MB - 2,048 MB(2GB)

2,048 bytes (2 KB)

4

2,049 MB and larger

4,096 bytes (4 KB)

8

Over the time hard disk gets fragmented that means single file is not stored in continuous clusters. Problem is mechanical component of disk needs to do some overwork and hard disk cache as well as windows disk cache can not do read-ahead caching algorithms. Accessing cache is always faster than disk sought. This interns into performance hit and in general disk defragment is recommended. Apart from performance hit, from Rackspace Point of view, we recommend defrag in every case either sluggish server response, managed backup failure and what not (I have seen recommendation for defrag on Rackwatch tickets, don’t laugh you have seen this as well. don’t you?)!!!!

Fun apart, let’s see what defragmentation process does and its limitation as well J Defragmentation utility rearranges the files so that they are stored in physically contiguous clusters. Along with used sectors, defrag process will consolidate free space so that new files will not be defragmented when they created.

Let’s see the limitation:

1) Disk defragmentation can not defragment Recycle Bin. For efficient defrag, always empty the Recycle Bin.

2) Disk defragmenter can not touch page file unless it is zeroed out. Use PageDefrag from SysInternal when it is absolutely necessary. On a high performance server, do not leave page file to grow automatically. This makes page file to be fragmented and performance hit when initializing new page file space.

3) Disk defragmenter will not defragment files that are in use. For best results shutdown all running programs. There was a debate in past whether to shutdown SQL or not before defragment. Before recommending to customer for SQL shutdown, see point 4.

4) Disk defragment will not defrag files greater than 16,000 contiguous clusters (~64 MB on volume greater than 2 GB) because it had negligible performance improvement by default. It is possible to pin down those files and defrag. It is safe to assume that fragmented file greater than 64 MB fragments is not fragmented as far as disk caching and Windows caching goes.

Before suggesting disk defragmentation, answer following questions:

1) Is the process I/O bound? If so how many files in general it refers. Is it in 10s, 100s or 1000s. For better performance it is ideal that files are not defragmented but to make significant improvement in performance it has to be in 1000s.

2) Is process capable of High Speed I/O? (Separate post to come). In general Microsoft Office product is capable of High speed I/O. If Process uses High Speed I/O, it makes very little improvement after defragmentation.

3) Analyze the volume and check the following things?

No.

Fragments

File Size

Most Fragmented Files

a

2,586

614 MB

\SysBkUp\SystemState.bkf

b

21

1600 MB

\ProgramFiles\DebugDiag\Logs\PerfLogs\PerfLog_Date__05_06_2008__Time_09_05_16PM__161.blg

a. How Frequently Fragmented is used? E.g. in (a), fragmented file is system state backup and it is the first in Most Fragmented Files list. Are we troubleshooting the “long system state backup time” related problem?

b. Divide File Size with Fragments e.g. in (b) 1600 / 21 = 76 MB avg. fragmented unit size. Would defragmentation be able to defrag by default and if so would it make any performance improvement in terms of Disk I/O? This is exactly the case with SQL related files. If you see fragment unit is greater than 64 MB, there is very negligible performance hit and defrag with or without stopping SQL will not make noticeable improvement.

4) Do not trust Volume fragmentation report. Make reasonable guess from most fragmented files list.

In following circumstances, please proceed with defrag:

5) Is your MFT fragmented? If “Total MFT Fragments” are greater than 5, proceed with defrag. Check the analysis report and see the section MFT fragmentation:

Master File Table (MFT) fragmentation

Total MFT size = 208 MB
MFT record count = 124,340
Percent MFT in use = 58 %
Total MFT fragments = 2

6) If “Most Fragmented Files” List contains your website files and they are not cached files (e.g. .aspx and .asp are cached most of the time along with some static files like .jpg and .gif). There are very few circumstances when fragmentation is the root cause for IIS performance and most of them are related to new file creation via upload.

Learning Curve:

While troubleshooting performance problem, consider this option as last rather first. Think twice whether you really need to defrag the volume to solve the problem?

Suggestions and comments are welcome!!!

Uday Pandya

Monday, 16 June 2008

Case of Unexplained, IIS FTP Problem….

Scenario: IIS FTP is configured in “Isolate User (without Active Directory)” mode as documented in following Microsoft Support Article:

http://support.microsoft.com/kb/555018

Problem: After reinstalling and reconfiguring all IIS related services many times, IIS FTP was not working in isolated mode. Every time when user tries to login into the system via FTP, it fails with following error:

Event Type: Warning

Event Source: MSFTPSVC

Event Category: None

Event ID: 13

Date: 01/06/2008

Time: 10:20:38

User: N/A

Computer: IWISHICOULDCHAN

Description:

User administrator failed to log on, could not access the home directory /.

I have decided to dive into the problem and I came to conclusion that IIS or IIS FTP is not the source problem. Inetinfo.exe is working as expected but could not able to find the home directory for specific users. I verified many time that directory with user’s username exists and have necessary permission for Inetinfo.exe process.

I have decided to launch ProcessMonitor and created a filter for Ineinfo.exe to investigate what was going on and I got the following activity.

Sequence: 12145

Date Time:08:59:25.0538032

Process Name: inetinfo.exe

PID: 1172

Operation: CreateFile

Path: C:\Inetpub\ftproot\LocalUser\IWISHICOULDCHAN\Administrator\

Result: PATH NOT FOUND

Detail: Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Open For Backup, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, Impersonating: IWISHICOULDCHAN\Administrator

I was stumped that inetinfo.exe tries to read the user’s home directory at above mentioned location. According to Microsoft’s support article, you need to create user’s home directory in “LocalUser” folder but here it seems that is not the case. I went ahead and created user directory as desired and everything started to work.

Investigation:

Microsoft Support articles mentioned following things about how to recreate FTP root-print and user folders:

If users of the local computer log in with their individual account user names, create the subdirectories LocalUser under the FTP site root directory you specified when you creating the FTP site.

For Example -

FTP root directory -> D:\MyFTP\

LocalUser locate at D:\MyFTP\LocalUser

For individual user, you need to create folder in this format - LocalUser\username

User: Susan locate at D:\MyFTP\LocalUser\Susan

If users of different domains log on with their explicit domain\username credentials, create a subdirectory for each domain (by using the name of the domain) under the FTP site root directory you specified when you creating the FTP site.

For Example -

Domain Name: Account

FTP root directory -> D:\MyFTP\

LocalUser locate at D:\MyFTP\Account

For individual domain user, you need to create folder in this format - Domain\username

Domain User: Nancy locate at D:\MyFTP\Account\Nancy[1]

It appeared that FTP problem I had follows the pattern of root print folder creation in domain environment but slight problem was I was running a server in Workgroup environment.

Inetinfo.exe was trying to read the file at “FTPRootDirectory\ServerName\%username%”. Careful look revealed that Server name was not actual name but it was server’s NetBIOS name. My full computer name was “IWISHICOULDCHANGE” longer than 15 characters (I have changed it for clarity) and hence NetBIOS name was “IWISHICOULDCHAN”.

Technical Details:

It seems that when you try to login via FTP, Inetinfo.exe process tries to read computer name. Please see the following event in ProcessMonitor:

Sequence: 11436

Date Time: 09:34:38.3419286

Process Name: inetinfo.exe

PID: 1172

Operation: RegQueryValue

Path:HKLM\System\CurrentControlSet\Control\ComputerName\ActiveComputerName\ComputerName

Result: SUCCESS

Data: IWISHICOULDCHAN

As we can see, Inetinfo.exe reads the computer name from registry and it returns only NetBIOS name of server.

I believe the original problem is how the authentication works and it gets transitioned. IIS FTP users Windows Authentication by default and relies on NTLM. When you login, IIS FTP impersonates the logging user to access the file or directory. ProcessMonitor shows following event:

Sequence: 7265

Date Time: 09:58:57.8573348

Process Name: inetinfo.exe

PID: 1172

Operation: CreateFile

Path C:\Inetpub\ftproot\LocalUser\IWISHICOULDCHAN\Administrator

Result: SUCCESS

Data: Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Open For Backup, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, Impersonating: IWISHICOULDCHAN\Administrator, OpenResult: Opened

And you will notice following event in Security Event Log:

Event Type: Success Audit

Event Source: Security

Event Category: Logon/Logoff

Event ID: 552

Date: 01/06/2008

Time: 11:22:50

User: NT AUTHORITY\SYSTEM

Computer: IWISHICOULDCHAN

Description:

Logon attempt using explicit credentials:

Logged on user:

User Name: IWISHICOULDCHAN$

Domain: WORKGROUP

Logon ID: (0x0,0x3E7)

Logon GUID: -

User whose credentials were used:

Target User Name: Administrator

Target Domain: IWISHICOULDCHAN

Target Logon GUID: -

Target Server Name: localhost

Target Server Info: localhost

Caller Process ID: 1172

Source Network Address: -

Source Port: -

It seems that while doing NTLM authentication when server name is longer than 15 characters, Inetinfo.exe process find that Isolated FTP site is running in domain environment and hence tries to excess the user’s home directory at “FTPRootDirectory\ServerName\%username%”.

Conclusion:

It seems that if you have a computer name longer than 15 characters and you are trying to create Isolated FTP site, you need to create user’s home directory in “FTPRootDirectory\NetBIOS Name of Server\” and not in “FTPRootDirectory\” as documented in support article. I haven’t found any support article mentioning this or any article pointing that server hosting IIS should not have name longer than 15 characters yet.

Side Problem:

When you change the computer name longer than 15 characters, Windows give you an Information Dialog box as follow but it is not enough with the problems you might have in future:

This saga continues with one more problems you have while creating a website.

My server has following IP configuration.

When computer name is longer than 15 characters, IIS Manager shows me only primary IP address as available IP address to configure website. Please see the following output:

Technically, I should see “10.2.105.107” and “10.2.105.108” as available IP address. To verify whether the problem is with IIS manager GUI or IIS it self, I modified server binding in “Metabase.xml” as follow:

Location ="/LM/W3SVC/1"

AppPoolId="DefaultAppPool"

DefaultDoc="Default.htm,Default.asp,index.htm,iisstart.htm"

LogPluginClsid="{FF160663-DE82-11CF-BC0A-00AA006111E0}"

ServerBindings="10.2.105.107:80:

10.2.105.108:80:"

ServerComment="Default Web Site"

ServerSize="1"

>

So it seems that IIS Manager is now aware of additional IPs available on the server (at least forced by us!!). I went ahead and start creating new website and stumped again!!!

IIS manager do not show me additional IP. It only shows me primary IP address of the server. I have verified that the problem is related to GUI and not IIS itself because when you create a new website manually (via command line) and create sever bindings to user only secondary IP address, IIS functions normally. I believe this happens because of how drop down dialog box gets populated!!



[1] http://support.microsoft.com/kb/555018