SQL CTEs: Usage, Advantages, and Drawbacks
All about SQL CTEs or Common Table Expressions, their types, and usage. We will also discuss the advantages and disadvantages of using them in your SQL queries.
When working with immense data in SQL, the complexity of the queries also increases proportionally. To be able to make head or tail of it, it helps to create temporary tables within the query to which you can refer, and that’s exactly what Common Table Expressions do. In this article, we will explore what SQL Common Table Expressions (CTE) are, how they function and we will enlist their advantages and disadvantages.
What is a Common Table Expression (CTE) in SQL?
SQL CTEs or Common Table Expressions are nothing but temporary tables that you can use within a query. A CTE is defined within the context of a single query. These temporary tables have their own records and columns that get created during the execution and are cleared after execution. They can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. Most importantly, using CTEs help create neater queries with improved performance.
Although CTEs and subqueries are alike, the former is way more useful and flexible. Before CTEs were introduced in SQL:2003, subqueries were the only option. But subqueries make the queries pretty convoluted and hard to understand or visualize. Thanks to CTEs, we can now name and reference the subqueries as and when needed in the main query.
Syntax of Common Table Expressions (CTE) in SQL
Without much ado, let us get right into the syntax of SQL Common Table Expression (CTE).
WITH cte_name (column_1, column_2, column_3,...) AS (subquery) SELECT * FROM cte_name;
Firstly, the ‘WITH’ keyword is used to define the CTE, and then the name of the CTE is declared. Column names can be mentioned along with the cte_name, but they are optional. Secondly, the ‘AS’ keyword is used to particularly mention the subquery that will go on to generate the temporary result set. Finally, the SQL CTE is ready to be referred to in the main query or any of the other subqueries.
Recursive CTE in SQL
Here’s a type of CTE that references itself, repeatedly. Here’s the syntax for a recursive CTE:
WITH expression_name (column_list) AS ( -- Anchor member initial_query UNION ALL -- Recursive member that references expression_name. recursive_query ) -- references expression name SELECT * FROM expression_name
When working with recursive CTEs in SQL, the query is divided into two components: the base case and the recursive case. The base case represents the first query that returns the initial set of results. The recursive case, on the other hand, is the query that is run repeatedly until the desired outcome is achieved.
For the recursive CTE to function, it must have an anchor member representing the initial query, and a recursive member representing the query that is executed repeatedly. To perform the recursive operation, the recursive member needs to refer back to the CTE itself.
Recursive CTEs come in pretty handy in hierarchical situations where each of the records has a parent-child relationship with another record. To ensure that the recursion stops once the desired result set is retrieved, a termination condition can be included in the WHERE clause of the recursive member in order to filter out the records that fail the termination condition.
They can be used to perform various operations, such as generating reports, calculating totals, and traversing complex data structures. However, it is important to use them judiciously, as they can be resource-intensive and can slow down performance if used improperly.
How SQL CTEs can help you simplify your work
One thing that you can be sure of when using a SQL CTE in your query is reduced complexity. The queries gain structure and make it easy for you to visualize the data in these virtual tables as they flow through the query in its entirety. They help you organize the code by splitting the query into bite-sized logical chunks that make it easier for you or anyone else working on or maintaining your query to understand.
Examples of Common Table Expressions (CTEs) in SQL
Here are a few examples of Common Table Expressions in SQL. Let’s examine a simple example of a CTE in a non-recursive use-case:
Interview Question Date: January 2023
Bank of Ireland has requested that you detect invalid transactions in December 2022. An invalid transaction is one that occurs outside of the bank's normal business hours. The following are the hours of operation for all branches:
Monday - Friday 09:00 - 16:00 Saturday & Sunday Closed Irish Public Holidays 25th and 26th December
Determine the transaction ids of all invalid transactions.
Link to the question: https://platform.stratascratch.com/coding/2143-invalid-bank-transactions
Let’s examine this query.
WITH cte AS( SELECT *, TO_CHAR(time_stamp, 'day') AS day_of_week, TO_CHAR(time_stamp, 'DD') AS day, CAST(time_stamp AS time) AS time FROM boi_transactions WHERE TO_CHAR(time_stamp, 'YYYY-MM') = '2022-12') SELECT transaction_id FROM cte WHERE day_of_week LIKE any(array['saturday%', 'sunday%']) OR NOT time BETWEEN '09:00:00' AND '16:00:00' OR day BETWEEN '25' AND '26'
In this question, given the bank’s normal working hours and public holidays, we are asked to detect the invalid transactions from the month of December 2022. Here, an invalid transaction is defined as that which occurs outside normal working hours.
We are only provided with one dataset, namely, boi_transactions that contains transaction IDs and corresponding timestamps. To solve this problem, we are segmenting the timestamp into day_of_week, day, and time in order to filter only the timestamps that occur within December 2022. For this operation, we use a CTE to store the timestamp segments in these temporary columns.
From this CTE or temporary result set, we write another WHERE clause in the main query that will filter out the holidays and select transactions that occurred outside of the working hours using time, day, and day_of_week columns.
Can we use multiple CTEs in the same SQL query?
Absolutely! The more complex the query, the better the idea to use multiple SQL CTEs. Multiple CTEs can be defined by separating them using commas so that they can be referenced in the main query as required.
Let us take a look at how to define multiple CTEs in the same query and join them in the main query.
WITH CTE_1 AS ( SELECT ... FROM ... WHERE ...), CTE_2 AS (SELECT ... FROM ... WHERE ...) SELECT ... FROM CTE1 JOIN CTE2 ON ... WHERE ...
It is worth noting that multiple CTEs can be used in a single line of the main query as long as you define them before referencing them.
Check out this example with multiple queries:
Interview Question Date: April 2022
For each unique user in the dataset, find the latest date when their flags got reviewed. Then, find total number of distinct videos that were removed on that date (by any user). Output the the first and last name of the user (in two columns), the date and the number of removed videos. Only include these users who had at least one of their flags reviewed by Youtube. If no videos got removed on a certain date, output 0.
Link to the question: https://platform.stratascratch.com/coding/2105-videos-removed-on-latest-date
Let us look into the solution to this question.
WITH latest_dates AS (SELECT user_firstname, user_lastname, max(reviewed_date) latest_date FROM user_flags AS uf INNER JOIN flag_review AS fr ON uf.flag_id = fr.flag_id WHERE reviewed_by_yt GROUP BY 1, 2), n_removed_by_date AS (SELECT reviewed_date, count(DISTINCT video_id) n_removed FROM user_flags AS uf INNER JOIN flag_review AS fr ON uf.flag_id = fr.flag_id WHERE LOWER(reviewed_outcome) = 'removed' GROUP BY reviewed_date) SELECT ld.*, coalesce(nr.n_removed, 0) n_removed FROM latest_dates ld LEFT JOIN n_removed_by_date nr ON ld.latest_date = nr.reviewed_date
In this example, we have created two CTEs, named latest_dates and n_removed_by_date. Each of these SQL CTEs provides us with the required data that we will reference and use in the main query. In this case, it is the latest_date column from the latest_dates CTE and the n_removed from the n_removed_by_date CTE that is required for our final output.
Advantages and Drawbacks of SQL CTEs:
SQL Common Table Expressions (CTEs) have a number of advantages that make them an increasingly popular feature of modern databases.
One of the biggest benefits of SQL CTEs is their reusability. Unlike subqueries, CTEs can be used multiple times within the same query, simplifying complex queries and improving query performance. By breaking up queries into smaller, logical pieces, SQL CTEs can also enhance readability, making code easier to understand and maintain. This can reduce the risk of SQL coding errors and improve overall code quality.
Memory consumption is another benefit, as CTE results are not stored in memory – they are available only after running the CTE.
While SQL CTEs are now a standard feature in many modern databases, they are not universally supported by all database platforms. Novice developers may also find CTEs more difficult to write and understand than traditional SQL queries due to their syntax complexity.
Another drawback is that the CTE result can’t be used by another SQL statement.
Overall, CTEs can simplify complex queries, improve query performance, enhance code readability, and reduce memory consumption. However, the potential drawbacks of SQL CTEs include the result not being reusable in another statement, limited support, and syntax complexity. It is important to carefully consider the specific needs of a project before deciding whether to use CTEs in database queries.
We dove into the uses, pros, and cons of Common Table Expressions (CTEs) in SQL. While using CTEs in SQL may reduce the performance of the query, the pros far outweigh the cons. Make your queries neat, readable, and reusable with the effective use of CTEs, and watch the quality of your queries transform to the next level.
Find more technical concepts, like CTEs, here “SQL Cheat Sheet”.