Introduction
It is normal in databases to have hierarchical tables, that is, tables that are related with themselves, forming a parent-child relation. For example, consider this:
The parent_id column points to the parent record, which, in some cases, will not exist.
So, imagine we have a number of records, such as:
1: INSERT INTO dbo.list (id, parent_id) VALUES (1, NULL)
2: INSERT INTO dbo.list (id, parent_id) VALUES (2, 1)
3: INSERT INTO dbo.list (id, parent_id) VALUES (3, 2)
4: INSERT INTO dbo.list (id, parent_id) VALUES (4, 3)
How can we find the id of the topmost parent? In this case, it will always be 1, of course.
In SQL Server, we have two options:
- A Common Table Expression (CTE);
- A recursive function.
Let’s see how to implement each.
Common Table Expression Approach
We need to write a CTE that starts with some record and goes all the way up until it finds the parent. Let’s wrap it in a nice scalar function:
1:CREATEFUNCTION dbo.GetTopmostParentCTE
2: (3: @id INT
4: ) 5:RETURNSINT
6:AS
7:BEGIN
8:DECLARE @parentId INT
9: 10: ;WITH cte AS
11: (12:SELECT a.id, a.parent_id
13:FROM dbo.list AS a
14:WHERE a.id = @id
15:UNIONALL
16:SELECT b.id, b.parent_id
17:FROM dbo.list AS b
18:INNERJOIN cte AS c
19:ON c.parent_id = b.id
20: ) 21:22:SELECTTOP 1 @parentId = id
23:FROM cte
24:WHERE parent_id ISNULL
25: 26:RETURN @parentid
27:END
28: GOI won’t explain here how CTEs work, they have been around for quite some time, and there are several posts how there for that.
Recursive Function Approach
The other approach is using a recursive function. The gotcha here is that when we create a function, it is compiled, and if it has a reference to itself – which doesn’t exist first – it will fail. Therefore, we need to first create a dummy function and then change it to do what we want:
1:CREATEFUNCTION dbo.GetTopmostParent
2: (3: @id INT
4: )5:RETURNSINT
6:AS
7:BEGIN
8:RETURN
9: (10:SELECT 0
11: )12:END
13:GO
14: 15:ALTERFUNCTION dbo.GetTopmostParent
16: (17: @id INT
18: )19:RETURNSINT
20:AS
21:BEGIN
22:RETURN
23: (24:SELECT CASEWHEN parent_id ISNULLTHEN id ELSE dbo.GetTopmostParent(parent_id) END
25:FROM dbo.list
26:WHERE id = @id
27: )28:END
29: GOConclusion
You can get results from the two functions by running the following T-SQL queries:
1:SELECT dbo.GetTopmostParent(4)
2:SELECT dbo.GetTopmostParentCTE(4)
Interesting, both execution plans are exactly the same:
I can’t really recommend one over the other, since from my tests, both took the same amount of time (you will need far more records than the ones from my sample to tell that).
So, any thoughts from database gurus out there?