In this article, we will learn usage details of the sp_updatestats built-in store procedure which helps to update all statistics in a SQL Server database. First of all, we will take a glance at the statistics concept in SQL Server.
Understanding the statistics in SQL Server
Statistics are database objects which involve the detailed statistical distribution of the column values for the tables or indexed views. The query optimizer uses this information to estimate how many rows will be returned from a query. If the statistics are up-to-date, the query optimizer makes more accurate estimates and as a result, it generates more optimized execution plans. The statistics information can go out-of-date when we perform modification operations in the tables. In this case, the query optimizer does not find out the best-optimized execution plan and may lead to performance issues. Therefore, keeping statistics up-to-date is the recommended performance practice.
Syntax
sp_updatestats runs UPDATE STATISTICS command for whole user-defined and internal tables in the current database so that the statistics will be up-to-date. The following shows the syntax of this procedure.
sp_updatestats [ [ @resample = ] ‘resample’]
With the help of the following query, we can update all tables statistics in the Adventureworks database.
1 2 3 |
USE AdventureWorks; GO EXEC sp_updatestats; |
The output message notices an “update is not necessary” for some statistics. This is because, even if solely one row was modified in the table since the last statistics update, the statistics will be updated by this procedure. Otherwise, the statistics would not be updated. The aforementioned tables in the above image haven’t shown any modification since the last statistics update, for this reason, these tables are excluded from the statistics update operation.
Now, we will prove this concept. sys.dm_db_stats_properties is a DMV which helps to monitor statistics details of the particular table or indexed view. Through the following query, we will find out the statistics details of the Production.Product table.
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT sp.stats_id, name, filter_definition, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter FROM sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp WHERE stat.object_id = OBJECT_ID('Production.Product'); |
WHERE stat.object_id = OBJECT_ID(‘Production.Product’);
Now, we will update only one row in the Production.Product table with the help of the following query and execute the statistics monitoring query again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
UPDATE Production.Product SET Name = 'Black Tire' WHERE ProductID = 1 GO SELECT sp.stats_id, name, filter_definition, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter FROM sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp WHERE stat.object_id = OBJECT_ID('Production.Product') |
As you can see, a value has changed in the modification_counter column, this column is specifying how many modifications occurred since the last statistics update. Now, we will run the sp_updatestats and examine the output of it.
1 2 3 |
USE AdventureWorks; GO EXEC sp_updatestats; |
The above image illustrates that a statistic has been updated whose name is AK_Product_ProductNumber. Obviously, one-row modification is enough to update the statistics. Now, reconsider the statistic properties of the Product.Product table.
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT sp.stats_id, name, filter_definition, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter FROM sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp WHERE stat.object_id = OBJECT_ID('Production.Product'); |
The last_updated column value has been updated to the last statistics performing date and modification_counter re-updated to 0 due to the update process after the last statistics there has not been any modification performed in the table. In this working mechanism, we should take one thing into account. Suppose that we have a huge table and only one row is modified in this table. Under this circumstance, updating the statistics for only one modification might be redundant and will waste the resources of the SQL Server. For this reason, if you decide to use sp_updatestats, consider this issue:
Arguments usage
1 |
sp_updatestats [ [ @resample = ] 'resample'] |
From the above syntax, we can clearly understand that we can use the procedure with the RESAMPLE option. If we use this option, the statistics are updated based on the latest sample rate. The following query shows the usage of the procedure with the RESAMPLE option.
1 |
EXEC sp_updatestats @resample = 'resample' |
If we don’t use the RESAMPLE option, the statistics will be updated with the default sampling rate. The default sampling rate is determined by the SQL Server automatically according to a total number of the rows in a table. The following query illustrates the default usage.
1 |
EXEC sp_updatestats |
As a last, if we decide to use the RESAMPLE, we should take into account one issue. If the last statistics have been updated by scanning the entire rows in the table, using the sp_updatestats with RESAMPLE option will update the statistics by scanning all rows in the table. Now, we will learn the difference between default and RESAMPLE usage options. We will populate 10,00,000 data to Sales.SalesReason table in the Adventureworks database and monitor this table properties via statistics monitoring query.
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT sp.stats_id, name, filter_definition, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter FROM sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp WHERE stat.object_id = OBJECT_ID('sales.salesreason'); |
PK_SalesReason_SalesReasonID statistics updated in 27.10.2017. On this date, the total row number of the table was 10 and the total number of the sampled rows was 10. Also, 1,000,000 rows have been modified since the latest statistics (27.10.2017) were updated. Now we will update statistics through the sp_updatestats procedure without any option.
1 |
EXEC sp_updatestats |
The output message explains that PK_SalesReason_SalesReasonID named statistic has been updated. We will reanalyze the statistics details of the table.
As we can see, rows_sampled value has been changed and the sampling rate is determined by SQL Server. Now, we will make an example for the RESAMPLE option so that we can find out the difference between these two options usage. At first, we restore the AdventureWorks database and then repeat the synthetic data creation. In this way, we obtain the same state of the PK_SalesReason_SalesReasonID statistic. The rows and rows_sampled column values are the same. It means that the previous update statistics operation is scanning all rows. Therefore, the RESAMPLE option forces the sp_updatestats procedure to behave the same.
Run the following query in order to update statistics.
1 |
EXEC sp_updatestats @resample = 'resample' |
As we can see in the above illustration, the statistics update operation performed the scanning all rows in the table. This result proves our theorem. The lesson we need to learn in this example, if the previous statistics update was scan all table rows, the RESAMPLE will also be scan all table rows.
Conclusion
In this article, at first we mentioned about the statistics and performance impact, essentially up-to-date statistics provide performance improvements for queries. In the continuing sections of the article, we learned how to update statistics with sp_updatestats and detailed usage.
- SQL Performance Tuning tips for newbies - April 15, 2024
- SQL Unit Testing reference guide for beginners - August 11, 2023
- SQL Cheat Sheet for Newbies - February 21, 2023