Recursive CTE's

Recursive CTE's

This BlogPost was written by Andre Speek. Click here for other BlogPosts.

One of the things you might encounter every once in a while is the need to diplay a hierarchy. Like Managers and Employees within a company, or a Project with Child Projects and even Grandchild Projects.

At first this might seem difficult, especially if you don't know how many levels deep to go. But luckily there is an easy way to do this, using Common Table Expressions, or CTE's for short.

Common Table Expressions

Imagine you're a detective trying to solve a complex case. You have a lot of evidence to sift through, and it's getting overwhelming. To make things easier, you decide to organize the evidence into smaller, more manageable pieces.

In SQL, a Common Table Expression (CTE) is like organizing your evidence into smaller pieces. It's a temporary named result set that you can use within a larger SQL query. CTE's can make complex queries easier to read, write, and maintain.

In general it will look something like this:

;with TemporaryName as (
  -- Some SQL Query here
)
select *
from TemporaryName

Now if you didn't want to use a CTE, you could also do it like this:

select *
from (
  -- Some SQL Query here
) as TemporaryResult

Common use of a CTE is when you have to do something complex within a Query to get for example the Revenue, something other complex to get the Costs and then do both again to subtract them and get the Margin. With a CTE you can do the Revenue and Costs in the upper part inaide the CTE, and then reuse those values in the final select statement. Should the determination for Revenue or Costs change, you only need to change it in one place of your Query.

To me using a CTE feels more natural and it is easier to read than the SubQuery version. Then again, in older versions of SQL Server (I believe prior to SQL 2008) they are not supported.

So while I prefer to use CTE's, it is sure not the only way to break down a complex query.

Recursive CTE's

CTE's have the unique ability to reference themselves, which make them easy to use with hierarchical data. Let's have a look at an example that will make it much easier to understand than a lenghty explanation would do. For this example I used an Exact Synergy database to get the hierarchical structure of the Employees.

With a Recursive CTE you always need a starting point. The anchor from which to start. In Exact SYnergy that would be the person who has a Manager (column repto_id) that is 0, zero. Since there might be some system users with a negative ID (res_id) I need to exclude those as well. That would result in something like this:

select res_id, fullname, repto_id, 0 as Level
from humres h
where res_id > 0
and repto_id = 0

As you can see I also added a fixed number as Level. In a moment it will be clear why I did that.

First let's put this in a CTE, like this:

;with Employees as (
  select res_id, fullname, repto_id, 0 as Level
  from humres h
  where res_id > 0
  and repto_id = 0
)
select *
from Employees
--option (maxrecursion 1000)
GO

Now that in itself is not too difficult, huh? But let's add in a Union All inside the CTE that references itself:

;with Employees as (
  select res_id, fullname, repto_id, 0 as Level
  from humres h
  where res_id > 0
  and repto_id = 0
  
  union all
  
  select h.res_id, h.fullname, h.repto_id, e.Level + 1
  from humres h
  inner join Employees e on h.repto_id = e.res_id
  where h.res_id > 0
)
select *
from Employees
--option (maxrecursion 1000)
GO

This might look weird at first glance, but doing this will trigger the CTE to become recursive. First the upper Query will get the top level manager, then the Union All part will return all of his/her subordinates. This will add a number of rows to the Employees result set, and the Union All will be fired again to get the subordinates of these Employees!

This will continue until the Union All Query will find no more Employees that have subordinates. That is to say, not indefinitely.

As you can see I added a counter to display the level we are at. By default the levels of recursion is set to 100 which to me has always been enough while querying data. But in the rare case you need to go deeper (some expamples below) just uncomment the Option Maxrecursion line to go to whatever level you need.

You could even set Maxrecursion to 0 in which case the recursion will go on indefinitely. But be very, very careful with that one. If there is bad data in your database, the Query might hang and your DBA might no longer like you.

Long story short

Use a Query to find your anchor point from which you want to start. Then use a Union All (not a Union, that won't work) and use the CTE itself in the next Query to trigger the CTE to become recursive. Be aware that it goes "only" 100 levels deep by default and use Option Maxrecursion to go deeper if needed.

Another example

Besides Querying data, you could also use a Recursive CTE to generate a View with a range of data. For example, you might need a View with numbers, like this:

;with numbers as (
  select 1 as Number
  union all
  select Number + 1
  from numbers
  where Number <= 1000
)
select *
from numbers
option (maxrecursion 1000)
GO

Or you might need a View with all the dates in a certain range:

declare @Start datetime = '2023-01-01'
declare @End datetime = '2023-12-31'

;with AllDates as (
  select @Start as Date
  union all
  select dateadd(Day, 1, Date) as Date
  from AllDates
  where Date < @End
)
select *
from AllDates
option (maxrecursion 500)
GO

As you can see we needed the Option Maxrecursion here since we go over the default of 100 levels.

Recursive CTE or Subquery

While subqueries can sometimes substitute for non-recursive CTE's, they cannot directly replace recursive CTE's. Recursive CTE's possess a unique ability to traverse hierarchical data structures within a single query, which traditional subqueries cannot easily replicate.
Here's a breakdown of why:

Recursive Nature

Recursive CTE's can reference themselves, enabling them to iterate through hierarchical data without knowing the depth beforehand. Subqueries, on the other hand, lack this self-referencing capability, making them unsuitable for tasks involving unknown levels of hierarchy.

Built-in Recursion:

Recursive CTE's have explicit syntax for recursion, simply by referencing themselves. Subqueries lack this explicit support for recursion, making hierarchical traversal more cumbersome.

Maintainability:

Recursive CTE's often enhance code readability and maintainability when working with hierarchical data. Their self-contained nature clarifies the intention, while subqueries can become nested and intricate, hindering comprehension.

Performance Considerations:

In certain cases, recursive CTE's may outperform subqueries for hierarchical queries, as they're often optimized by database engines for recursive operations.

That's it for today!

Going through hierarchical data might seem very difficult at first glance. And it certainly is when you don't know it can be done using a Recursive CTE.

But hey, now you know! Go out and make good use of them!



Do you want to comment on this post? Leave a response here and I will get in touch with you!

You can leave a comment or send a question to andre@andrespeek.com. Let me know if you got inspired and stay safe and healthy always!




An error has occurred. This application may no longer respond until reloaded. Reload 🗙