JOIN Elimination in Microsoft SQL Server is an optimization technique with foreign key constraints where the query engine removes redundant JOIN operations during query execution. This occurs when the data from the joined table is not used in the SELECT clause or any other part of the query, making the JOIN unnecessary. By eliminating such JOINs, the optimizer reduces query complexity and improves performance without altering the query’s results.
Inhaltsverzeichnis
JOIN Elimination – a practical example
The Marketing Department of a company needs an interface that allows employees to carry out analyses of the number of orders for customers, nations and/or regions. To do this, the development department created a view with all relevant tables. The view is based on the following table structure / foreign key relationship.
CREATE OR ALTER VIEW dbo.marketing_analysis
AS
SELECT r.r_name AS region_name,
n.n_name AS nation_name,
c.c_name AS customer_name,
o.o_orderkey AS order_number,
o.o_orderdate AS order_date
FROM dbo.regions AS r
INNER JOIN dbo.nations AS n
ON (r.r_regionkey = n.n_regionkey)
INNER JOIN dbo.customers AS c
ON (n.n_nationkey = c.c_nationkey)
INNER JOIN dbo.orders AS o
ON (c.c_custkey = o.o_custkey);
GO
The view contains all relevant relations and presents attributes of the different relations.
JOIN Elimination in Practice
/* Count all orders for all regions */
SELECT region_name,
COUNT_BIG(*)
FROM dbo.marketing_analysis
GROUP BY
region_name;
SQL Server must address all tables because a query over the dbo.region for the orders is only possible via the tables dbo.nations and dbo.customers.
/* Count all orders by nations */
SELECT nation_name,
COUNT_BIG(*)
FROM dbo.marketing_analysis
GROUP BY
nation_name;
GO
JOIN elimination can now take effect because SQL Server has recognized that the dbo.regions table is no longer required to determine the data.
/* Count all orders by customer_name */
SELECT customer_name,
COUNT_BIG(*)
FROM dbo.marketing_analysis
WHERE customer_name = 'Customer#000121993'
GROUP BY
customer_name;
GO
The goal is to exclude tables when executing the query because their data is not relevant for executing the query.
When JOIN Elimination fails
For JOIN elimination to be successful, TRUSTED foreign key relationships must exist between the relations. Tables can only be ignored if SQL Server can be sure that there are no data anomalies!
Untrusted Foreign Key and JOIN Elimination
As soon as a foreign key is no longer trustworthy, SQL Server can no longer avoid checking the data of a table in a JOIN. Especially during loading processes, a foreign key is often deactivated during the loading process, only to be activated again afterwards.
/* Disable the foreign key between dbo.regions and dbo.nations */
ALTER TABLE dbo.nations
NOCHECK CONSTRAINT [fk_dbo.nations_n_regionkey_dbo.regions_r_regionkey]
GO
/* Count all orders by customer_name */
SELECT customer_name,
COUNT_BIG(*)
FROM dbo.marketing_analysis
WHERE customer_name = 'Customer#000121993'
GROUP BY
customer_name;
GO
/* Enable the foreign key between dbo.regions and dbo.nations */
ALTER TABLE dbo.nations
WITH CHECK CHECK CONSTRAINT [fk_dbo.nations_n_regionkey_dbo.regions_r_regionkey]
GO
NULLable foreign key and JOIN Elimination
It seems counterintuitive that an attribute for storing a foreign key can be NULLable. But just because it seems counterintuitive doesn’t mean it won’t work. For the example above, the attribute for storing the foreign key n_regionkey in the table dbo.nations will be NULLable
/* Make the foreign key attribute NULLable */
ALTER TABLE dbo.nations
ALTER COLUMN n_regionkey INT NULL;
GO
Thank you very much for reading!