Deep Dive into SQL Server Statistics

In this article, we will go through some details about SQL Server statistics.

The first step into SQL Server statistics

The query plan is a set of instructions that describes how a query will be executed by the database engine. The query plans are created by the query optimizer in SQL Server and it requires some inputs to generate a query plan before executing a query. At this point, SQL Server statistics play a key role in the query plan generation process because it is one of the most important inputs as they store the distribution of the column data of the tables in a histogram. This statistical meta-data is used to estimate how many rows will be returned by the executed query and according to this estimation, the I/O, CPU, and memory resource desired from the database engine.

Auto Create Statistics and Auto Update Statistics

SQL Server allows us to create and update statistics operations automatically. When we create an index, SQL Server automatically creates statistics for the indexed columns. At the same time, the query optimizer can decide to create statistics for the single columns during the execution of the query. This option can be easily enabled or disabled using SQL Server Management Studio (SSMS) on the database level. We can find out this feature on the Options tab of the database properties.

On the other hand, statistics will be out of date (stale) after the data modification operations because data modifications will change the data value distributions of the columns. In this case, the statistics will need to be updated so that the query optimizer can generate more accurate execution plans. The statistics update operation can be completed either manually or automatically. Auto Update Statistics feature can be enabled or disabled under the Options tab of the database properties.

The query optimizer will update the SQL Server statistics automatically when the following circumstances occur:

The table row number has gone from 0 rows to more than 0 rows
The table had 500 rows or less when the statistics were last sampled and since had more than 500 modifications
The table had more than 500 rows when the statistics were last updated and the number of row modifications is more than MIN ( 500 + (0.20 * n), SQRT(1,000 * n) ) formula result after the statistics were last sampled

What’s going on behind the scenes

As we stated, SQL Server has the ability to automatically create or update statistics during the execution of the query. However, the SQL Server database engine performs some tasks when it automatically creates or updates SQL Server statistics. In order to understand this concept better, we will show an example. As a first step, we will create a sample table in the Adventureworks database and will populate it with some data.

CREATE TABLE TestNewProduction

(PID INT PRIMARY KEY IDENTITY(1,1),

Name VARCHAR(50),ProductNumber VARCHAR(50),SafetyStockLevel INT ,ReorderPoint INT)

INSERT INTO TestNewProduction ( Name,ProductNumber,SafetyStockLevel,ReorderPoint)

SELECT TOP 100 Name,ProductNumber,SafetyStockLevel,ReorderPoint FROM Production.Product

After that, we will create an extended event that will capture:

When an automatic statistics create operation is performed
When an automatic statistics update operation is performed

CREATE EVENT SESSION Capture_StatsUpdate ON SERVER

ADD EVENT sqlserver.auto_stats(

)

ADD TARGET package0.ring_buffer

ALTER EVENT SESSION Capture_StatsUpdate ON SERVER STATE = START;

The extended event which will be created includes the Auto Stats event and this event can capture index and column statistics updates automatically. At the same time, this event can capture when SQL Server statistics are being loaded for use by the optimizer. Now we will execute the following query and look at the captured events.

Tip: We will add 1=(SELECT 1) expression add the end of the query to avoid trivial query plans

SELECT Name from TestNewProduction

WHERE Name LIKE 'A%'

AND 1=(SELECT 1)

After executing the query, we will see two captured events. The first one describes that new statistics are created by the query optimizer and the second event specifies that the created SQL Server statistics are loaded and used by the query optimizer.

IF OBJECT_ID(N'tempdb..#capture_waits_data') IS NOT NULL

BEGIN

DROP TABLE #capture_waits_data

END

SELECT CAST(target_data as xml) AS targetdata

INTO #capture_waits_data

FROM sys.dm_xe_session_targets xet

JOIN sys.dm_xe_sessions xes

ON xes.address = xet.event_session_address

WHERE xes.name = 'Capture_StatsUpdate'

AND xet.target_name = 'ring_buffer';

SELECT xed.event_data.value('(@timestamp)[1]', 'datetime2') AS [timestamp],

xed.event_data.value('(data[@name="job_type"]/text)[1]', 'varchar(100)') AS job_type,

xed.event_data.value('(data[@name="async"]/value)[1]', 'varchar(100)') AS async,

xed.event_data.value('(data[@name="status"]/text)[1]', 'varchar(100)') AS [status],

xed.event_data.value('(data[@name="statistics_list"]/value)[1]', 'varchar(1000)') AS statistics_list,

xed.event_data.value('(data[@name="database_id"]/value)[1]', 'int') AS database_id,

xed.event_data.value('(data[@name="sample_percentage"]/value)[1]', 'int') AS sample_percentage

FROM #capture_waits_data

CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data)

ORDER BY 1 DESC

At the same time, the performed events can be tracked with SQL Server Profiler but don’t forget that profiler is a deprecated tool. The profiler clearly shows all events during the execution of the query.

As can be seen from the illustration above, during the execution of the query, the SQL Server query optimizer decides to create the statistics, and also it recompiles the query. The creation of the new statistic process is related to the Auto Create Statistics option. After creating or updating statics, the query optimizer recompiles the executed query, assuming there will be more up-to-date histogram data.

To simulate the Auto Update Statistics statistics option we will insert some new rows into the TestNewProduction table and the table rows number will reach over 500 so the query optimizer will update the statistic when we execute the select statement.

1 2	INSERT INTO TestNewProduction ( Name,ProductNumber,SafetyStockLevel,ReorderPoint) SELECT Name,ProductNumber,SafetyStockLevel,ReorderPoint FROM Production.Product

After inserting the new rows into the table, we will execute the same query and re-check the result of the captured event list.

SELECT Name from TestNewProduction

WHERE Name LIKE 'A%'

AND 1=(SELECT 1)

As we can see, the statistic has updated during the execution of the query, and in this context, this question may appear in your minds:

“Do creating or updating statistic operations affect the query performance?”

We will answer this question with a little test. For this test, I have created an extended version of the TestNewProduction and have populated 70 million rows into this table. The statistic creation process has taken about 500 (500.000 microseconds) milliseconds on my laptop.

However, the performance impact of this operation will also strictly depend on how many rows are used to calculate the histogram data. The rows_sampled column indicates how many rows are used in the last statistics create or update operation and it can be found out in the statistic header.

SELECT OBJECT_SCHEMA_NAME(obj.object_id) SchemaName, obj.name TableName,

stat.name, modification_counter,

[rows],

rows_sampled, (CAST(rows_sampled AS float) / CAST([rows] AS float))*100 AS [% Rows Sampled],

last_updated

FROM sys.objects AS obj

INNER JOIN sys.stats AS stat ON stat.object_id = obj.object_id

CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp

WHERE obj.is_ms_shipped = 0

AND stat.name = '_WA_Sys_00000002_47DBAE45'

For our example, the sampled number of rows 503.654, and it corresponds to %0,7. In order to create a more advanced histogram, the statistics can be computed by using all rows in the table. The FULL SCAN expression helps to create or update SQL Server statistical operations by reading all the data of the relevant column or columns. This time, creating the statistic operation will lock the table for a long time and process duration will also increase. Another point that we need to consider is that more I/O may occur updating statistics with FULL SCAN expression because SQL Server will read whole data of the column or columns.

Auto Update Statistics Asynchronously

Auto Update Statistics Asynchronously is another option about the SQL Server statistics that can be configured at the database level. When we enable the Auto Update Statistics Asynchronously option query optimizer does not wait for the update of statistics like the synchronous mode and uses the outdated statistics. On the other hand, another thread starts to update statistics operations. In this option, query processing does not wait for the time that will be spent to update statistic histogram but the generated query plan will use the stale statistic data. We can enable or disable under the Options tab of the database properties.

After enabling this feature, we will add some new rows into the TestNewProduction so that the data modification counter exceeds the update statistics threshold.

INSERT INTO TestNewProduction ( Name,ProductNumber,SafetyStockLevel,ReorderPoint)

SELECT Name,ProductNumber,SafetyStockLevel,ReorderPoint FROM Production.Product

GO 100

As a last, we will execute our sample select query and it leads to update SQL Server statistics asynchronously.

SELECT Name from TestNewProduction

WHERE Name LIKE 'A%'

AND 1=(SELECT 1)

All statistics events will be captured by the Capture_StatsUpdate extended event.

When we look at the captured events, we can see that the firstly updated statistics process assigned a queue and then it is processed in the queue. The sys.dm_exec_background_job_queue_stats dynamic management view (DMV) help to monitor this queue.

Conclusion

In this article, we have explored some secret tips about SQL Server statistics. Statistics are game-changers for query performance because query optimizers use statistics to estimate how many rows will be returned from the executed queries.

Author
Recent Posts

Esat Erkec

Esat Erkec is a SQL Server professional who began his career 8+ years ago as a Software Developer. He is a SQL Server Microsoft Certified Solutions Expert.

Most of his career has been focused on SQL Server Database Administration and Development. His current interests are in database administration and Business Intelligence. You can find him on LinkedIn.

View all posts by Esat Erkec

The first step into SQL Server statistics

Auto Create Statistics and Auto Update Statistics

What’s going on behind the scenes

Auto Update Statistics Asynchronously

Conclusion

Related posts: