Learn to Use a Recursive CTE in SQL Query

What Are Common Table Expressions (CTEs)?

Before we begin looking into recursions, let us get the basics covered. If you’re not already familiar with Common Table Expressions, here is the general idea of it. CTEs are synonymous with subqueries but they have their own temporary table names and contain rows and columns of data created during execution. The temporary tables are cleared after execution. While active, these CTEs aid in query reusability and improved performance.

CTEs can help reduce the complexity of the queries and promote better readability. But there is a level of complexity that simple CTEs cannot solve. In cases where you are dealing with hierarchical data, for instance, bigger projects broken down into smaller modules, a dynamic webpage with multiple links, and the organizational structure of a company; such tables, and graphs require recursion to extract information.

This is where recursive CTEs come into play.

Hierarchical and Graph Data Structures

Hierarchical and graph data structures are widely employed in SQL to facilitate recursion, enabling the representation and traversal of data relationships. Let's delve into each structure and its distinctive traits.

The hierarchical data structure pertains to a relationship between a parent and a child, where each child has a single parent. It is typically represented in SQL using a self-referencing table, where a column refers to its own primary key. This type of structure is ideal for depicting data with a rigid hierarchy, like organizational charts or file systems. For instance, in an employee table, each row can contain a reference to its manager through a foreign key. By employing recursive queries in SQL, it becomes possible to navigate this hierarchical structure, enabling the retrieval of all descendants or ancestors associated with a specific node.

A hierarchical data structure typically appears as a tree-like structure, where each element has a specific parent-child relationship. It consists of levels or layers, starting from a single root node at the top, branching out into multiple child nodes, and further branching out into subsequent child nodes, forming a hierarchy.

Visually, a hierarchical data structure can be represented as follows:

Hierarchical data structure in SQL to facilitate recursion

Here, the "Root" node represents the highest level in the hierarchy, with the "Child" nodes branching out from it, and further extending into "Grandchild" nodes. Each child node has a single parent, and each grandchild node has a single parent as well.

An example of it would be an organizational structure as shown below:

Organizational structure in SQL to facilitate recursion

This example represents the reporting relationships within an organization. This structure facilitates seamless traversal from high-ranking executives to their corresponding departments, teams, and individual employees, enabling smooth navigation within the organization.

Recursive SQL queries can be applied to navigate this hierarchical arrangement and execute tasks such as retrieving all employees subordinate to a particular manager or determining the hierarchy of command from a specific employee to the CEO.

In contrast, the graph data structure illustrates intricate connections among entities through nodes and edges. Unlike the hierarchical structure, nodes in a graph can establish multiple links with other nodes. This structure proves valuable for modeling interconnected data, including social networks or network topologies. In SQL, graphs can be represented using adjacency lists or adjacency matrices. By employing recursive queries on graph structures, it becomes possible to traverse paths, identify connected components, or determine the shortest paths between nodes.

Let’s look into an example of a graph data structure.

Graph data structure in SQL to facilitate recursion

This graph exemplifies a network topology in which the nodes symbolize devices or network components, while the edges depict the connections between them. For instance, node A is linked to nodes B, D, and G, signifying the existence of network links or connections among these devices.

It illustrates a complex network topology with interconnected nodes. It can represent various scenarios such as computer networks, communication networks, or infrastructure networks.

The graph structure enables the examination of network connectivity, identification of possible areas of congestion or vulnerability, and enhancement of network design to ensure smooth communication and efficient data transfer.

Tasks such as finding the shortest path between two nodes, determining the most critical nodes in terms of network traffic, or identifying the clusters or subnetworks within the larger topology can be accomplished with the assistance of graph algorithms.

The selection of the suitable structure relies on the characteristics of the data and the intended operations to be executed using recursion in SQL.

What is a Recursive CTE in SQL?

Recursion is like a Russian nesting doll of queries. You are referencing the CTE from within the CTE, again and again, until you retrieve all the desired data you want.

A Recursive CTE references its own subset of results, repeatedly, until all the results are extracted.

You might remember the syntax of a non-recursive SQL CTE from our previous article.

WITH cte_name   
AS (subquery)     

SELECT * 
FROM cte_name;

Syntax of Recursive Common Table Expression (CTE) in SQL

A non-recursive CTE makes use of the ‘WITH’ keyword to name the temporary result set along with the ‘AS’ keyword to write the subquery which defines the CTE.

The syntax of the recursive CTE is more elaborate than that. Let’s take a look at its syntax.

WITH RECURSIVE cte_name AS (
    cte_definition  --anchor member
 
    UNION ALL
 
    cte_definition --recursive member
)
SELECT *
FROM   cte_name;

Firstly, the WITH keyword is appended by the ‘RECURSIVE’ keyword. (This is unless you work in MS SQL Server, where the keyword ‘RECURSIVE’ must be omitted.) Once the CTE is named, the ‘AS’ keyword is used to define the CTE. It is within this definition of CTE where the recursion happens.

The CTE definition consists of two parts:

The anchor member is the base case of the recursion. This defines the starting point of the recursive operation.
The recursive member is the part that refers back to the temporary result set created by the anchor member and mentions how the recursion will keep executing until a specific criterion is met.

We need to connect the anchor and recursive members using UNION or UNION ALL commands.

How Does Recursive CTE Work in SQL?

The syntax may not be descriptive enough to understand the mechanism of a recursive CTE. So, let us break it down.

Achor Member: This is the query that will return the base result of the CTE.
Recursive Member: This is the query that will reference the base CTE. It is merged with the anchor member with the UNION or UNION ALL operations.
Termination Condition: In order to terminate the recursion, a condition must be specified. Otherwise, your recursive CTE might be stuck in an infinite loop.

The first part to execute will be the anchor member. It gives us the base result set which is mentioned here as R0. Using R0 as the base table, the recursive member will execute and return a sub-result set, denoted as Ri. Now, using R1 as the base table, the recursion occurs again repeatedly until the condition for termination is met.

Finally, the result sets of R0, R1, … Rn, where n = iteration count, are combined together using the UNION ALL operator. The UNION ALL operation in turn returns the final result set.

Examples of Recursive CTEs in SQL

Let’s look at some examples of recursive CTEs and how you can use them in different complex scenarios.

Example 1: Calculating the Factorial of a Given Number Using Recursive CTE

The factorial, for those who need a little reminder, is the product of all positive integers that are less or equal to n.

Denoted as n!, the factorial of 5 is:

5! = 5*(5-1)*(5-2)*(5-3)*(5-4) = 5*4*3*2*1 = 120

Now, let’s see how to translate this into SQL code.

WITH RECURSIVE factorial_cte(n, factorial) AS (
  SELECT 1, 1   -- Anchor member: initialize the initial values
  UNION ALL
  SELECT n + 1, (n + 1) * factorial
  FROM factorial_cte
  WHERE n < 5   -- Recursive member: perform the recursive calculation until the condition is met
)
SELECT factorial
FROM factorial_cte
WHERE n = 5;

Here, the anchor member initializes the value of both n and factorial as 1. The recursive member goes on to perform the calculation using recursion by incrementing n and multiplying it with the current factorial value until n reaches the given value of 5.

Finally, the factorial value is selected where n is 5. The output for ‘factorial of 5’ is 120 as you can see below:

factorial
----------
120

Example 2: Hierarchical Data Structure

Take for instance a table called employees that represents the hierarchical structure of employees of an organization. It contains the columns employee_id and manager_id. So, in such a scenario, it makes sense to use recursive CTE to extract all employees under a particular manager.

Say we have the following employees table:

Let’s begin the query.

WITH RECURSIVE employee_hierarchy AS (
  SELECT employee_id, name, manager_id, 1 as level
  FROM employees
  WHERE employee_id = 1   -- Anchor member: select the starting employee
  UNION ALL
  SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
  FROM employees e
  JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id   -- Recursive member: join with previous level
)
SELECT employee_id, name, level
FROM employee_hierarchy;

In this example, we begin with the anchor member by selecting the employee with employee_id as 1. The recursive member combines the employees table with the CTE employee_hierarchy (referenced in FROM as any other table), by matching the manager_id with the previous level’s employee_id and then increments the level by 1. This process continues until no more matches are found. Ultimately, we select the employee_id, name, and level from the resulting employee_hierarchy table.

The output of the query looks like this:

Example 3: File Directory Structure

Let’s look at another example involving the files table. This table represents a file directory structure with columns file_id, name, and parent_id. We can use recursive CTEs to retrieve all files from the directory and their respective paths.

The files table looks like this.

The query to retrieve files is as follows:

WITH RECURSIVE file_paths AS (
  SELECT file_id, name, parent_id, name as path
  FROM files
  WHERE parent_id IS NULL   -- Anchor member: select the root files
  UNION ALL
  SELECT f.file_id, f.name, f.parent_id, CONCAT(fp.path, '/', f.name)
  FROM files f
  JOIN file_paths fp ON f.parent_id = fp.file_id   -- Recursive member: join with previous level
)
SELECT file_id, path
FROM file_paths;

The output of this query is:

The output displays the file_id and the complete path for each file, starting from the root directory and including all parent directories.

Example 4: Graph Structure

Suppose we have a table called ‘network_topology’ that depicts the relationships between various nodes within the network. Each row in the table corresponds to an edge connecting two nodes. The table comprises two columns: ‘source_node’ and ‘target_node’, which signify the origin and destination nodes of each connection.

Say the network_topology table contains the following data:

Here’s the SQL query to traverse through network_topology.

WITH RECURSIVE network_traversal AS (
  SELECT source_node, target_node, 1 as hop_count, source_node as path
  FROM network_topology
  WHERE source_node = 'A'   -- Anchor member: select the starting node
  UNION ALL
  SELECT nt.source_node, nt.target_node, nt.hop_count + 1, nt.path || ' -> ' || nt.target_node
  FROM network_topology nt
  JOIN network_traversal nt_prev ON nt.source_node = nt_prev.target_node   -- Recursive member: join with previous level
)
SELECT target_node, hop_count, path
FROM network_traversal;

Let's analyze the code to gain a better understanding of its functionality:

The anchor member selects the starting node ('A') and assigns a hop count of 1. It also initializes the path with the starting node.
The recursive member joins the `network_topology` table with the previous level of the CTE (`network_traversal`) based on the source node and target node relationship. It increments the hop count by 1 and appends the target node to the existing path.
The final `SELECT` statement retrieves the target node, hop count, and path from the `network_traversal` CTE.

By utilizing the recursive CTE, the code traverses the network topology beginning from node 'A' and recursively explores its interconnected nodes. At each step, the hop count is incremented, and the path is updated accordingly. This process continues until there are no more connections left to explore.

Here’s the output table:

The result is an output table that displays the traversal path, hop count, and target node for each step taken during the traversal.

Recursive CTE vs Non-Recursive CTE

Now you have a clear understanding of how the recursive CTEs work. You must also be aware of the critical differences between recursive and non-recursive CTEs. We are going to inspect five facets of the CTEs and we will see how they differ. Let’s dive in.

Purpose

While the non-recursive CTEs are used to simplify complex queries by creating a temporary named result sets, their use is more to abstract the query logic for better readability. Recursive CTEs on the other hand are specifically designed to handle hierarchical and recursive data structures. Their purpose is to enable the traversing and processing of data that is part of a parent-child relationship.

Iteration and Self-references

Straight off the bat, we know that non-recursive CTEs do not involve iteration or self-referencing. The CTE is executed only once and the temporary result set is used to reference in the CTE as many times as needed. There is no recursion occurring there.

However, recursive CTEs, as the name suggests, do involve iterative self-referencing. The recursion begins with the anchor member that acts as a base table and then they iterate over a recursive member which references back to the base table. The iteration continues until the termination condition is met.

Structure and Syntax

Non-recursive CTEs consist of the ‘WITH’ keyword followed by the cte_name and the ‘AS’ keyword that defines the CTE. Recursive CTEs also begin with the ‘WITH’ keyword which is followed by the ‘RECURSIVE’ keyword. The cte_name is mentioned after the ‘WITH RECURSIVE’ keywords followed by the ‘AS’ keyword and the definition of the CTE.

The definition itself includes an anchor member, a recursive member, and a termination condition. The anchor member is required to initialize the query parameters and the recursive member references the CTE itself.

Data Manipulation

Both variations of CTEs are used extensively to manipulate data in complex queries. The difference, however, is in the level of complexity that they can contribute to detangling.

Non-recursive CTEs are generally used for querying data, selecting subsets, performing joins, and applying aggregations within the scope of a single query. Recursive CTEs are preferred for processing hierarchical data including calculating hierarchical paths, traversing trees, and aggregating values in a hierarchical structure.

Use Cases

While we’re discussing how these CTEs can be utilized in queries, let us now look into some use cases.

Most commonly, non-recursive CTEs are used for filtering and transforming data, joining tables, also for performing calculations without the need for convoluted subqueries.

Recursive CTEs can be used in cases where managing organizational structures, computing recursive aggregates, implementing graph algorithms, or performing recursive calculations like factorials or the Fibonacci series are required.

Summary

To summarize, Recursive CTEs allow for self-referencing queries and are specifically designed for handling hierarchical and recursive data structures. This article highlights the differences between recursive and non-recursive CTEs. Recursive CTEs are a powerful tool for processing hierarchical data and performing complex calculations in SQL.

It is definitely something you must have in your arsenal if you wish to be advanced in SQL. To learn more about advanced SQL concepts, check out “SQL interview questions” to explore more questions and practice these concepts hands-on.

Learn to Use a Recursive CTE in SQL Query

Categories

What Are Common Table Expressions (CTEs)?

Hierarchical and Graph Data Structures

What is a Recursive CTE in SQL?

Syntax of Recursive Common Table Expression (CTE) in SQL

How Does Recursive CTE Work in SQL?

Examples of Recursive CTEs in SQL

Recursive CTE vs Non-Recursive CTE

Summary

Latest Posts:

From Arrays to Lists: Converting Numpy Arrays with Python

Tips for Crafting an Impressive Data Analyst Portfolio

PySpark GroupBy Guide: Super Simple Way to Group Data

Share

Follow

Categories