Use Foreign Key Constraints in your Databases - Part I

When revising one of my popular sessions on scenarios in which indexes no longer help, one of the topics covered was foreign key constraints for data integrity and JOIN eliminiation. During my research, I came across various arguments from developers who reject foreign key constraints for data integrity in databases. This led me to take a closer look at these statements and the disadvantages listed. This is the first article on foreign keys in a series of articles I intend to write to test, confirm, or refute the claims.

Inhaltsverzeichnis

Edgar F. Codd – Rule #10
Arguments against Foreign Key constraints for Data Integrity
Debunking Myth about slow DML operations with Foreign Key constraints
evaluation of the results
Summary

Edgar F. Codd – Rule #10

Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.

It has been almost 40 years since Edgar F. Codd postulated his 12 rules describing guidelines for a relational database system that still exist today. Why do database developers resist standards and implement integrity rules in applications; it should be the job of RDBMS systems!

Integrity rules (data integrity) are enforced with constraints. Other constraints are e.g.

NOT NULL
UNIQUE
PRIMARY KEY

Arguments against Foreign Key constraints for Data Integrity

I have researched a few opinions against the use of foreign keys on the Internet. I will leave it up to you to evaluate the reasons, but I will make my opinion known.

NOTE: All statements are from people who – in general – are not against foreign key relationships but give reasons why they should not be used!

Statement about blocking statements and time constraints for the checks. — Bouncers prevent anyone from coming in!

These arguments seem far-fetched to me personally. Yes, the checks take time because the check accesses the master table. How big is the time difference if the check is carried out in the database and if the application checks the existence beforehand?

Statement about the flexibility when no Check Contraints are in the tables — Free Willie for my data

Classic arguments against foreign key constraints, aren’t they. I find the reasoning „strange“ because the authors says that they can delete „problematic“ rows or can do anything else with the data without concerns. The authors didn’t clearly fully grasped the concept of foreign key constraints. The fact is that in the relational model developed according to the rules of Edgar F. Codd, data integrity is important and there can be NO problematic rows!

there can be no orders that have been assigned to an unknown customer.
Customers cannot be deleted from a table if orders, invoices, or other dependend rows on other relations exist…

A foreign key relationship ensures that only customers that exist in the [customers] table can be selected for an order! So how can „problematic“ data rows arise when the database system is tasked with preventing them!

Debunking Myth about slow DML operations with Foreign Key constraints

With a few examples from my demo database ERP_DEMO, which I use in my workshops, I would like to examine the concerns about foreign keys and prove the opposite.

the picture shows four tables with their relations to each other.
regions has a 1 to many relation to nations.
nations has a 1 to many relation to customers
customers has a 1 to many relation to orders — Integrity Rules by Design

Time – an important criterion

It is stated that the time component prevents the implementation of foreign key constraints. When data is inserted into the detail table, data is changed or data is deleted, the RDBMS System checks the foreign keys to ensure data integrity. With three tests (each test ran 5 times!) I checked the time difference with and without foreign keys. The pictures showes the test results with the avg run time!

In the scenario, 10,000 rows are inserted simultaneously by 20 threads. Both the average time and the total time, separated by runtime and CPU time, are logged. The Stored Procedure is straight forward and pushes row by row into the table. The tests have been made with SQLQueryStress.

CREATE OR ALTER PROCEDURE dbo.InsertNewOrder
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	INSERT INTO dbo.demo_orders
	SELECT	TOP (1) * FROM dbo.orders;
END
GO

No Foreign Key Constraints – no check

This picture shows the execution time of app. 15 seconds with an average execution time of 1 ms for each interation.

No Foreign Key constraints but check by application

If the data integrity is not to be ensured by the RDBMS system, the application must take on this task. That’s how fair it should be.
The stored procedure is extended to check whether the customer to be entered exists.

CREATE OR ALTER PROCEDURE dbo.InsertNewOrder
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	INSERT INTO dbo.demo_orders
	SELECT TOP (1) o.*
	FROM	dbo.orders AS o INNER JOIN dbo.customers AS c
			ON (o.o_custkey = c.c_custkey);
END
GO

The integrity test is kept primitive because I am only concerned with the temporal impact. I enter a data record and cover its validity for the customer with a JOIN. This means that the customer data record must exist.

This picture shows the execution time of app. 18 seconds with an average execution time of 1,4 ms for each interation.

Foreign Key Constraints for Data Integrity – BuiltIn

ALTER TABLE dbo.demo_orders
ADD CONSTRAINT fk_demo_orders_customers
FOREIGN KEY (o_custkey) REFERENCES dbo.customers (c_custkey);
GO

This picture shows the execution time of app. 11 seconds with an average execution time of 0,9 ms for each interation.

evaluation of the results

The result surprised me. When I started this blog post, I was sure that inserting data without any further checks would win by a long way.
I was all the more surprised that, despite checking the value to be entered, the implemented foreign key clearly won the race.

test description	total runtime	avg runtime/iteration	avg CPU/iteration
Native inserts without checks	~15 sec.	0,0010 sec.	0,0010 sec.
Native inserts with checks by app	~18 sec.	0,0014 sec.	0,0007 sec.
Inserts with Foreign Key	~11 sec.	0,0009 sec.	0,0008 sec.

Summary

Foreign keys are designed to comply with the rules for relational databases postulated by Edgar F. Codd. Their purpose is not to annoy developers or make their lives difficult.
One argument against using foreign keys is the time delay when entering/updating/deleting data. A foreign key has hardly any measurable influence on the entry of data. In my example, they were faster than other techniques.

Thank you for reading!

Use Foreign Key Constraints in your Databases – Part I