This week I was called out twice to an incident at a customer’s site where a massive blocking problem occurred in the web shop. The root cause was the AUTO_UPDATE STATISTICS option. One blog article is not enough to explain the complexity of this problem. That’s why I will be publishing more articles over the next few days that will make the problem easier to understand. The current article deals with the behavior of the automatic update.

What does AUTO_UPDATE_STATISTICS mean?

The AUTO_UPDATE_STATISTICS option in Microsoft SQL Server ensures that query optimization remains efficient by automatically updating statistical information about data distribution in tables and indexed views. This feature triggers a statistics update whenever the query optimizer detects that the data has changed significantly enough to potentially affect query plans. By default, this option is enabled for databases, enhancing query performance without requiring manual intervention.

When are statistics automaticaly updated?

Lifecyle of statistics update
  • We start the execution of a query with a WHERE predicate, JOIN, …
  • After the binding phase SQL Server switches to the optimization process
  • During the optimization process, the statistics are loaded and checked.
    While statistics are loaded Microsoft SQL Server holds an SCH_S lock to protect the meta data!
  • If the statistics are outdated the automatic statistics update will begin.
  • If statistics are not configured to be updated asynchronously, the query must wait until the statistics have been updated.
  • After the statistics are updated, an execution plan is created and the query is executed.
    When Microsoft SQL Server updates the statistics object it requires for a short time a SCH_M lock.

What is the threshold for statistics to be updated?

The threshold for updating statistics depends on the version of Microsoft SQL Server used. I have described the formula and a helper function in the article Helper function for determining the update threshold for statistics.

/* Create a new table dbo.process_data */
CREATE TABLE dbo.process_table
(
	id	INT			NOT NULL	IDENTITY (1, 1),
	c1	VARCHAR(25)	NOT NULL,
	d1	DATETIME	NOT NULL,

	CONSTRAINT pk_process_table
	PRIMARY KEY CLUSTERED (id)
);
GO

/* and create statistics for the columns c1 and d1 */
CREATE STATISTICS process_table_c1 ON dbo.process_table (c1);
CREATE STATISTICS process_table_d1 ON dbo.process_table (d1);
GO

/* we fill it with 1.000.000 rows from the dbo.customers table */
INSERT INTO dbo.process_table
(c1, d1)
SELECT	TOP (1000000)
		c_name, DATEADD(DAY, ABS(CHECKSUM(c_name)) / -1000000, GETDATE())
FROM	dbo.customers;
GO

/*
   and check the statistics information about the update threshold
   NOTE: The function is part of my demo database for workshops and sessions!
*/
SELECT	su.statistics_name,
        su.stats_id,
        su.rows,
        su.rows_sampled,
        su.modification_counter,
        su.required_update_rows,
        su.update_counter_percentage
FROM	sys.stats AS st
		CROSS APPLY info.statistics_update(st.object_id, st.stats_id, DEFAULT) AS su
GO
The picture shows one line for each statistics object and its actual modifcation number. Furthermore it shows the number of required changes to the statistics before the update of the statistics will start.
Information about the status of statistics

With 1.000.000 rows in a table there are 31.622 updates required to start the process of updating the statistics.

/*
	Let's update 31.622 rows (d1) in the table.
	Let's update 10.000 rows (c1) in the table.
*/
UPDATE	dbo.process_table
SET		d1 = GETDATE()
WHERE	id <= 31622;
GO

UPDATE	dbo.process_table
SET		c1 = 'blablabla'
WHERE	id <= 10000;
GO
number of modifications in statistics object
/*
	Nothing will happen if we run a query against c1
	Update of statistics will jump in for d1
*/
SELECT * FROM dbo.process_table
WHERE	c1 = 'Uwe Ricken';
GO

SELECT * FROM dbo.process_table
WHERE	d1 >= '2024-12-21';
GO
Updated statistics object when threshold has reached

As you can clearly see, only one statistics object was updated. An automatic update is triggered by the optimizer. For the first query for a predicate in c1, the updates were not yet sufficient, while the updates on d2 reached the threshold. An update only takes place on d2!

Thank you very much for reading!