Last couple of weeks, I have received an emergency notification from the service desk saying that SharePoint is running really slow. I replied to them it could be multiple reasons so I asked them was there any major outages on the network infrastructure, datastore, VM areas. He said No! — Then I started investigating the issue by looking into the servers. When I look at the servers I found out that one of WFE is utilizing high CPU by the SharePOint hosted on IIS. It could be due to heavy load or some custom code bug and so on. Hence, I would like to find out the root cause for the issue by using DebugDiag.
DebugDiag is a great debugging tool provided by the Microsoft. First, I started running the ProcessExplorer (SysInternal) tool to find out what the processes are taking CPUs. I found out w3wp.exe was taking 62%. So, I opened DebugDiag and collected a full user memory dump for the spiking w3wp.exe.
Once I create a dump, it created the following two files:
1. The *.dmp file havs the memory dump of the process containing information about the threads, stack trace, locks and so on.
2. The *.txt file contains a list of processes running at the time of dump collection.
We need to collect at least 2-3 memory dumps with an interval 4-5 seconds to ensure that we are troubleshooting in the correct path. Once it is done, go to the Advanced Analysis tab and add the dump file with the analysis script Crash\hang Analyzers as shown below than click on “Start Analysis”:
It will open up a .mht report in the browser. I have noticed that Thread 22 has triggered GC. This analysis report provides the following details:
- All running threads along with CPU time.
- List of all running requests (under Client Connections).
- Stack trace of each thread.
- Process details like up time, process ID and so on.
- Server and OS details.
I then further start the analysis the dump using the WinDbg tool and determine why GC has been triggered. Based on the analysis, I have concluded the following:
– High CPU utilization may occur due to an infinite loop in code or heavy load or by exhausting memory and in turn triggering GC, which spikes CPU.
– To troubleshoot the issue, first I restarted IIS using the IISRESET command and collected 10 sets of memory dumps at the time of the CPU spike.
– Analyzed each dump individually using Advanced Analysis of DebugDiag.
– Looked at the thread running for a long time and analyzed the stack trace from bottom to top.
– Digged into the method having the issue as per stack trace and advised the developer to fine tune or correct the code.
– Some time If the advanced analysis might not help much then I always use Debugging Tools for Windows specifically the WinDbg tool to analyze the dump.
- The WinDbg tool is best suited to troubleshoot .NET (Managed) applications, that have extensions to dump the CLR Stack, Heap objects and so on.