Introduction
Foreign key constraints are a powerful mechanism for preserving referential integrity in a database. They can also represent a challenge when doing bulk table loads, since you need to find a “base” table to start with – that is, a table that has no foreign key constraints defined. Let’s label tables like this as level 0, or ground level if you like. Once that is loaded, you can begin to load other tables that have foreign key references to the base table. We can label those tables level 1, and so on. If you start with table data that already has referentially integrity and load tables by their level numbers — level 0, level 1, level 2 and so on – the load should proceed without problems. Let’s look at a simple example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE TABLE base( id int IDENTITY(1,1) PRIMARY KEY, b float) INSERT INTO base(b) VALUES (42), (3.14159), (2010401) CREATE TABLE facts( id int IDENTITY(11,1) PRIMARY KEY, base_id int FOREIGN KEY REFERENCES base(id), c varchar(50)) INSERT INTO facts(base_id, c) VALUES (1, 'The Answer'), (2, 'pi'), (3, 'April Fools Day 2018') CREATE TABLE morefacts( id int IDENTITY(21,1) PRIMARY KEY, facts_id int FOREIGN KEY REFERENCES facts(id), d varchar(50)) INSERT INTO morefacts(facts_id, d) VALUES (11, 'to the question'), (12, 'transcendental number'), (13, 'the jokes on you!') |
This set of three tables are at levels 0, 1, and 2, respectively, since “base” has no FK references, “facts” refers to “base” and “morefacts” has an FK referring to “facts”. Now, imagine that you have new data for all three tables, in the form of INSERT statements:
1 2 3 4 5 6 7 8 9 10 |
INSERT INTO morefacts(facts_id, d) VALUES (14, 'golden ratio'), (15, 'limit of (1 + 1/n)^n') INSERT INTO facts(base_id, c) VALUES (4, 'phi'), (5, 'Euler''s number') INSERT INTO base(b) VALUES (1.618), (2.718) |
Now, you know that you can’t insert them that way, or you’ll get an error message like this one:
The INSERT statement conflicted with the FOREIGN KEY constraint “FK__morefacts__facts”
You need to do these in reverse order to preserve referential integrity. This is easy with this little example since we are in total control. Now, imagine that you were asked to load up a database with lots of foreign key relationships, but you didn’t know the levels of any of the tables. How would you proceed? There are a few different ways to tackle the problem and in this article I’m going to leverage the power of recursive Common Table Expressions, or CTEs, to do it.
The System Catalog View sys.foreign_keys
SQL Server now provides almost 300 system catalog views that are useful for all sorts of metadata operations. If I include the dynamic management views the total is almost 500!. sys.foreign_keys, as the name implies, shows you the foreign key relationships for a database. I can combine this view with a recursive CTE to dig out the foreign key relationships in the database. A full discussion of recursive CTEs is outside the scope of this article. See the references section for more detail on how they work. For now, just keep in mind that a recursive CTE has two parts, just like a mathematical recurrence:
- A base case
- A recursive case that builds on the base case
For our example, a query to get the base case would look like this:
1 2 3 4 5 6 7 8 9 10 |
SELECT DISTINCT fk.object_id AS FK, fk.schema_id AS SchemaId, fk.parent_object_id AS TableId, t.schema_id AS ReferencedSchema, fk.referenced_object_id AS ReferencedTable FROM sys.foreign_keys AS fk JOIN sys.tables AS t ON fk.referenced_object_id = t.object_id WHERE fk.type = 'F' |
Run this on any database you have access to and observe the results. On my test database I see:
The recursive case builds on this by finding tables referenced by the base case:
1 2 3 4 5 6 7 8 9 10 11 |
SELECT fk.object_id, fk.schema_id, fk.parent_object_id, t.schema_id, fk.referenced_object_id FROM sys.foreign_keys fk JOIN sys.tables t ON fk.referenced_object_id = t.object_id JOIN base_case ON fk.parent_object_id = base_case.referenced_object_id WHERE fk.type = 'F' |
This query is almost the same as the one above except that it joins with the base case, matching the parent object id, which is the table containing the FK reference. To the base case referenced object id. In this way we can get the tables referring to the base case tables and continue until there are no more, since this is recursive!
Putting the two queries – the base case and the recursive case – together in a recursive CTE yields this query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
WITH cte AS (SELECT DISTINCT fk.object_id, fk.schema_id, fk.parent_object_id, t.schema_id AS referenced_schema_id, fk.referenced_object_id, 0 AS Depth FROM sys.foreign_keys AS fk JOIN sys.tables AS t ON fk.referenced_object_id = t.object_id WHERE fk.type = 'F' --AND fk.parent_object_id = OBJECT_ID(N'morefacts', N'U') UNION ALL SELECT fk.object_id, fk.schema_id, fk.parent_object_id, t.schema_id, fk.referenced_object_id, cte.Depth - 1 FROM sys.foreign_keys AS fk JOIN sys.tables AS t ON fk.referenced_object_id = t.object_id JOIN CTE ON fk.parent_object_id = cte.referenced_object_id WHERE fk.type = 'F' --AND fk.parent_object_id <> cte.referenced_object_id ) SELECT OBJECT_NAME(cte.object_id) AS ReferringKey, SCHEMA_NAME(cte.schema_id) AS ReferringSchema, OBJECT_NAME(cte.parent_object_id) as ReferringTable, SCHEMA_NAME(cte.referenced_schema_id) AS ReferencedSchema, OBJECT_NAME(cte.referenced_object_id) as ReferencedTable, cte.referenced_object_id, cte.Depth FROM cte ORDER BY Depth DESC,ReferencedSchema, ReferencedTable, ReferringKey; |
There are two commented lines that I’ll come back to in a moment. Running this on my test database I get:
Here, the column “Depth” represents the level a table is with respect to one referring to it. In this case, the table “morefacts” refers to the table “facts”. So “morefacts” is at ground level (Depth=0) and “facts” is in the first basement (Depth = -1). With such a report I know I need to load the deepest levels first, then those above them and so on until I reach ground level.
Now, let’s look at the two commented lines. The first one, if uncommented, lets me just look at the references from a specific table:
1 |
AND fk.parent_object_id = OBJECT_ID(N'morefacts', N'U') |
Running that produces a smaller report:
No surprise there. The second commented line is trickier. On my SQL Server instance I also have the sample database WideWorldImporters installed. Let’s try the query with both lines commented on that database. I get an error:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
The problem is that this database contains a table that is self-referential:
This comes from the fact that the People table contains a hierarchy. Hierarchies can be used to show people in a reporting structure, where an employee points to their manager in the same table. Here it actually looks like a mistake! The foreign key says, “Be sure that a person in the table really is also in the table”. Of course that is always true! For our discussion though, it means that we are chasing our own tail during the recursive part of the query. If I uncomment the second comment:
1 |
AND fk.parent_object_id <> cte.referenced_object_id |
The problem will disappear:
I just pasted part of the output. It is actually quite a bit longer, which you can verify for yourself.
Bottom up?
The query we’ve been using takes a top-down approach. The problem we had with the People table suggests another approach. Can we find tables that refer to it? We can! We’ll use a bottom up approach. Actually the query changes very little:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
WITH cte AS (SELECT DISTINCT fk.object_id, fk.schema_id, fk.parent_object_id, t.schema_id AS referenced_schema_id, fk.referenced_object_id, 1 AS Level FROM sys.foreign_keys AS fk JOIN sys.tables AS t ON fk.referenced_object_id = t.object_id WHERE fk.type = 'F' AND fk.referenced_object_id = OBJECT_ID(N'Application.People', N'U') UNION ALL SELECT fk.object_id, fk.schema_id, fk.parent_object_id, t.schema_id, fk.referenced_object_id, cte.Level + 1 FROM sys.foreign_keys AS fk JOIN sys.tables AS t ON fk.referenced_object_id = t.object_id JOIN CTE ON fk.referenced_object_id = cte.parent_object_id WHERE fk.type = 'F' AND fk.parent_object_id <> cte.referenced_object_id ) SELECT DISTINCT OBJECT_NAME(cte.object_id) AS ReferringKey, SCHEMA_NAME(cte.schema_id) AS ReferringSchema, OBJECT_NAME(cte.parent_object_id) as ReferringTable, SCHEMA_NAME(cte.referenced_schema_id) AS ReferencedSchema, OBJECT_NAME(cte.referenced_object_id) as ReferencedTable, Level FROM cte ORDER BY Level, ReferencedSchema, ReferencedTable, ReferringKey; RETURN; |
I’ve highlighted the changes. Basically, we start with tables that refer to Application. People and work up from there. This query yields the desired result for the WideWorldImporters database, though they are too big to post here (325 lines). The Level goes all the way up to 10, indicating a little of the complexity of the data model used here.
Summary
This brief excursion into recursive CTEs applied to system views shows how easy it can be to tease out the relationships between objects in a SQL Server database. If you’re new to common table expressions, especially the recursive variant, this gives you a simple example to understand how they work. Don’t use them for everything, however! Sometimes developers are tempted to use recursive CTEs in place of cursors or while loops, thinking that there will be some performance advantage. Usually, those hopes are dashed! Internally, recursive CTEs are processed “Row By Agonizing Row”, or RBAR, a term created by Jeff Moden, a veritable super-DBA in the Microsoft SQL Server space.
If you’re new to system catalog views, let this serve as the briefest of introductions to a large topic!
- Snapshot Isolation in SQL Server - August 5, 2019
- Shrinking your database using DBCC SHRINKFILE - August 16, 2018
- Partial stored procedures in SQL Server - June 8, 2018