Thursday 31 March 2016

Troubleshooting High-CPU Utilization for SQL Server

The first and the most common step if you suspect high CPU utilization (or are alerted for it) is to login to the physical server and check the Windows Task Manager. The Performance tab will show the high utilization as shown below: 



Next, we need to determine which process is responsible for the high CPU consumption. The Processes tab of the Task Manager will show this information:


For this types of situation we have to remember that CPU consume time in two modes as

1) Kernal Mode
2) User Mode

These two mode can be seen by "Performance Monitor" by monitoring "%Privilege Time" and "%User Time" counter.


Remember that "%Privileged time" is not based on 100%.It is based on number of processors.If you see 200 for sqlserver.exe and the system has 8 CPU then CPU consumed by sqlserver.exe is 200 out of 800 (only 25%).

If "% Privileged Time" value is more than 30% then it's generally caused by faulty drivers or anti-virus software. In such situations make sure the BIOS and filter drives are up to date and then try disabling the anti-virus software temporarily to see the change.

If "% User Time" is high then there is something consuming of SQL Server.
There are several known patterns which can be caused high CPU for processes running in SQL Server including

 Some of the most common causes for High CPU in SQL Server are

1 . Query executing causing CPU spike ( In general caused by optimizer picking bad plan)

2. High Compiles and Re-compiles ( In general stats change , schema change , temp tables , recompiled all the user defined SP's etc)

3. Running many traces.

Note: If your server is under severe stress and you are unable to login to SSMS, you can use another machine’s SSMS to login to the server through DAC – Dedicated Administrator Connection.


SELECT  scheduler_id
        ,cpu_id
        ,status
        ,runnable_tasks_count
        ,active_workers_count
        ,load_factor
        ,yield_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255



See below for the BOL definitions for the above columns:

scheduler_id – ID of the scheduler. All schedulers that are used to run regular queries have ID numbers less than 1048576. Those schedulers that have IDs greater than or equal to 1048576 are used internally by SQL Server, such as the dedicated administrator connection scheduler.
cpu_id – ID of the CPU with which this scheduler is associated.
status – Indicates the status of the scheduler.
runnable_tasks_count – Number of workers, with tasks assigned to them that are waiting to be scheduled on the runnable queue.
active_workers_count – Number of workers that are active. An active worker is never preemptive, must have an associated task, and is either running, runnable, or suspended.
current_tasks_count - Number of current tasks that are associated with this scheduler.
load_factor – Internal value that indicates the perceived load on this scheduler.
yield_count – Internal value that is used to indicate progress on this scheduler.

The next step is to identify which queries are demanding a lot of CPU time. The below query is useful for this purpose (note, in its current form, it only shows the top 10 records).

SELECT TOP 10 st.text
               ,st.dbid
               ,st.objectid
               ,qs.total_worker_time
               ,qs.last_worker_time
               ,qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_worker_time DESC

This query has total_worker_time as the measure of CPU load and is in descending order of the total_worker_time to show the most expensive queries and their plans at the top:
Note the BOL definitions for the important columns:
total_worker_time - Total amount of CPU time, in microseconds, that was consumed by executions of this plan since it was compiled.
last_worker_time - CPU time, in microseconds, that was consumed the last time the plan was executed.


I re-ran the same query again after a few seconds and was returned the below output.


Now figure out whether it is singe query or stored procedure causing CPU spike.

1. If the stats are up to date then estimated rows and estimated execution will be approximately same in the execution plan. If there is huge difference then stats are outdated and required update.

2. Rebuild or re-organize the indexes and also create if the indexes are not available.

3. If update statistics or rebuilding the indexes doesn't help you bringing down the CPU then tune the query one by one.

3. If the procedure is causing the CPU spike then

a. Use SET NOCOUNT ON to disable no of effected rows message. It is required only to test or debug the code.

b. Use schema name with the object name if multiple schemas exist in the database. This will helpful in directly finding the compiled plan instead of searching for the object in other schema. This process of searching schema for an object leads to COMPILE lock on SP and decreases the SP's performance. So always its better to refer the objects with the qualified name in the SP.

c. Do not use the prefix "sp_" in the stored procedure name . If you use then it will search in the master database. Searching in the master database causes extra over head and also there are changes to get wrong resulyt if the same SP found in the master database.

d. Use IF EXISTS (SELECT 1) instead of (SELECT * ) to check the existence of a record in another table. Hence EXIST will use True or False.

e. If the query which is spiking linked server query try changing the security of linked server to ensure liked server user has ddl_admin or dba/sysadmin on the remote server.

f. Try to avoid using the SQL Server cursors when ever possible and use while loop to process the records one by one.

g. Keep the transaction as short as possible - The length of transaction affects blocking and deadlocking.Exclusive lock is not released until the end of transaction. For faster execution and less blocking the transaction should be kept as short as possible.


h. Use Try-Catch for error handling it will help full to easily debug and fix  the issues in case of big portion of code.

2. If the system thread is consuming most of the CPU

If none of the SQL queries are consuming majority of the cpu then we can identify if the back ground threads is consuming the majority of CPU by looking at sysprocesses output for background threads.

Select * from sys.sysprocesses where spid<51


Check if you are hitting any of the known issues such as resource monitor may consume high CPU (hot fixes available ) or ghost clean up task uses 100% of the CPU on the idle system in SQL Server 2008 or SQL Server 2005.



No comments:

Post a Comment