This week I was called out twice to an incident at a customer’s site where a massive blocking problem occurred in the web shop. The root cause was the AUTO_UPDATE STATISTICS option. One blog article is not enough to explain the complexity of this problem. That’s why I will be publishing more articles over the next few days that will make the problem easier to understand. The current article deals with the behavior of the automatic update.
Inhaltsverzeichnis
What does AUTO_UPDATE_STATISTICS mean?
The AUTO_UPDATE_STATISTICS option in Microsoft SQL Server ensures that query optimization remains efficient by automatically updating statistical information about data distribution in tables and indexed views. This feature triggers a statistics update whenever the query optimizer detects that the data has changed significantly enough to potentially affect query plans. By default, this option is enabled for databases, enhancing query performance without requiring manual intervention.
When are statistics automaticaly updated?
- We start the execution of a query with a WHERE predicate, JOIN, …
- After the binding phase SQL Server switches to the optimization process
- During the optimization process, the statistics are loaded and checked.
While statistics are loaded Microsoft SQL Server holds an SCH_S lock to protect the meta data! - If the statistics are outdated the automatic statistics update will begin.
- If statistics are not configured to be updated asynchronously, the query must wait until the statistics have been updated.
- After the statistics are updated, an execution plan is created and the query is executed.
When Microsoft SQL Server updates the statistics object it requires for a short time a SCH_M lock.
What is the threshold for statistics to be updated?
The threshold for updating statistics depends on the version of Microsoft SQL Server used. I have described the formula and a helper function in the article Helper function for determining the update threshold for statistics.
/* Create a new table dbo.process_data */
CREATE TABLE dbo.process_table
(
id INT NOT NULL IDENTITY (1, 1),
c1 VARCHAR(25) NOT NULL,
d1 DATETIME NOT NULL,
CONSTRAINT pk_process_table
PRIMARY KEY CLUSTERED (id)
);
GO
/* and create statistics for the columns c1 and d1 */
CREATE STATISTICS process_table_c1 ON dbo.process_table (c1);
CREATE STATISTICS process_table_d1 ON dbo.process_table (d1);
GO
/* we fill it with 1.000.000 rows from the dbo.customers table */
INSERT INTO dbo.process_table
(c1, d1)
SELECT TOP (1000000)
c_name, DATEADD(DAY, ABS(CHECKSUM(c_name)) / -1000000, GETDATE())
FROM dbo.customers;
GO
/*
and check the statistics information about the update threshold
NOTE: The function is part of my demo database for workshops and sessions!
*/
SELECT su.statistics_name,
su.stats_id,
su.rows,
su.rows_sampled,
su.modification_counter,
su.required_update_rows,
su.update_counter_percentage
FROM sys.stats AS st
CROSS APPLY info.statistics_update(st.object_id, st.stats_id, DEFAULT) AS su
GO
With 1.000.000 rows in a table there are 31.622 updates required to start the process of updating the statistics.
/*
Let's update 31.622 rows (d1) in the table.
Let's update 10.000 rows (c1) in the table.
*/
UPDATE dbo.process_table
SET d1 = GETDATE()
WHERE id <= 31622;
GO
UPDATE dbo.process_table
SET c1 = 'blablabla'
WHERE id <= 10000;
GO
/*
Nothing will happen if we run a query against c1
Update of statistics will jump in for d1
*/
SELECT * FROM dbo.process_table
WHERE c1 = 'Uwe Ricken';
GO
SELECT * FROM dbo.process_table
WHERE d1 >= '2024-12-21';
GO
As you can clearly see, only one statistics object was updated. An automatic update is triggered by the optimizer. For the first query for a predicate in c1, the updates were not yet sufficient, while the updates on d2 reached the threshold. An update only takes place on d2!
Thank you very much for reading!