Module 4: Subqueries & CTEs•35 min

Understanding Subqueries

Progress Tracking

When One Query Isn’t Enough

Here’s a question that comes up all the time: “Find everyone who earns more than average.” Sounds simple, but think about it. You need to know the average before you can filter by it. That’s two questions, not one.

You could run two queries. First, get the average, write it down, then use that number in your WHERE clause. But that’s tedious, error-prone, and falls apart when the data changes. Subqueries let you do both in one shot.

A subquery is just a query inside another query. The inner one runs first, and its result gets plugged into the outer one. This module teaches you both subqueries and CTEs (Common Table Expressions). We’ll start with subqueries because you’ll encounter them constantly in existing code, then move to CTEs, which are StrataScratch’s recommended approach for new queries.

Subqueries in WHERE Clause

Your First Subquery

SQL

SELECT *
FROM orders
WHERE total_order_cost > (
  SELECT AVG(total_order_cost) FROM orders
);

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

The inner query calculates the average salary. The outer query uses it to filter. No hardcoded numbers, no two-step process.

The Classic “Find the Max” Pattern

This comes up constantly in interviews: find the row with the highest value for a given column.

SQL

SELECT
  first_name,
  last_name,
  salary
FROM employees
WHERE salary = (
  SELECT MAX(salary)
  FROM employees
);

The subquery finds the highest salary. The outer query finds everyone earning that amount. If there’s a tie, you get multiple rows.

Workers With The Highest Salaries

Combine subqueries with JOINs to find top performers.

Table: worker

worker_id	first_name	last_name	salary	joining_date	department
1	Monika	Arora	100000	2014-02-20	HR
2	Niharika	Verma	80000	2014-06-11	Admin
3	Vishal	Singhal	300000	2014-02-20	HR
4	Amitah	Singh	500000	2014-02-20	Admin
5	Vivek	Bhati	500000	2014-06-11	Admin

Table: title

worker_ref_id	worker_title	affected_from
1	Manager	2016-02-20
2	Executive	2016-06-11
8	Executive	2016-06-11
5	Manager	2016-06-11
4	Asst. Manager	2016-06-11

Tables: worker, title

IN: Matching a List

What if your subquery returns multiple rows? That’s when you use IN.

SQL

-- Find employees in departments that have high earners
SELECT
  first_name,
  last_name,
  department
FROM employees
WHERE department IN (
  SELECT DISTINCT department
  FROM employees
  WHERE salary > 100000
);

NOT IN: The Trap Everyone Falls Into

NOT IN seems like the obvious way to exclude values. And it works great, until it doesn’t.

Here’s the problem: if that subquery returns even one NULL value, NOT IN returns zero rows. Not some rows. Zero. It’s bitten every SQL developer at least once.

NOT IN with NULLs returns nothing

This is one of SQL’s most infamous gotchas. NOT IN with a NULL in the list returns nothing, always. The fix is to use NOT EXISTS instead.

EXISTS and NOT EXISTS

EXISTS asks a simple question: “Does at least one matching row exist?” It returns TRUE or FALSE, and stops searching the moment it finds a match.

SQL

-- Find customers who have placed at least one order
SELECT
  c.customer_id,
  c.customer_name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.cust_id = c.customer_id
);

That SELECT 1 isn’t magic. EXISTS only cares whether rows exist, not what’s in them.

NOT EXISTS: The NULL-Safe Alternative

NOT EXISTS finds rows where no matching rows exist. It’s cleaner and safer than NOT IN.

SQL

-- Find customers who have never ordered
SELECT
  c.customer_id,
  c.customer_name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.cust_id = c.customer_id
);

EXISTS is efficient

EXISTS stops searching as soon as it finds a match. For large tables, this can be much faster than IN.

Customers with Large Orders

Use EXISTS with a correlated subquery that checks for orders over $100.

Table: online_store_customers

customer_id	customer_name
1	Alice Johnson
2	Bob Smith
3	Carol Williams
4	David Brown
5	Emma Davis

Table: online_store_orders

order_id	customer_id	amount	status
101	1	150	paid
102	1	200	paid
103	1	75	paid
104	2	250	paid
105	3	180	paid

Tables: online_store_customers, online_store_orders

Subqueries in SELECT

You can put a subquery in SELECT to add a calculated column to every row.

Table: customers

id	first_name	last_name	city	phone_number
8	John	Joseph	San Francisco	928-386-8164
7	Jill	Michael	Austin	813-297-0692
4	William	Daniel	Denver	813-368-1200
5	Henry	Jackson	Miami	808-601-7513
13	Emma	Isaac	Miami	808-690-5201

Table: orders

id	cust_id	order_date	order_details	total_order_cost
1	3	2019-03-04	Coat	100
2	3	2019-03-01	Shoes	80
3	3	2019-03-07	Skirt	30
4	7	2019-02-01	Coat	25
5	7	2019-03-10	Shoes	80

Tables: customers, orders

This counts orders for each customer. The subquery runs once per row. For small tables, that’s fine. For big tables, it can get slow.

Scalar subqueries must return one value

Scalar subqueries in SELECT must return exactly one value. If they return multiple rows, you get an error.

Derived Tables: Subqueries in FROM

Put a subquery in FROM to create a temporary result set, then query it like a regular table.

SQL

SELECT
  department,
  total_salary
FROM (
  SELECT
    department,
    SUM(salary) AS total_salary
  FROM employees
  GROUP BY department
) AS dept_totals
WHERE total_salary > 200000;

The inner query aggregates. The outer query filters those aggregates.

Derived tables need an alias

Derived tables must have an alias. That AS dept_totals isn’t optional.

Correlated Subqueries

The subqueries above can run independently. Correlated subqueries are different: they reference the outer query and run once per row.

Table: employee

id	first_name	last_name	age	sex	employee_title	department	salary	target	bonus	email	city	address	manager_id
5	Max	George	26	M	Sales	Sales	1300	200	150	Max@company.com	California	2638 Richards Avenue	1
13	Katty	Bond	56	F	Manager	Management	150000	0	300	Katty@company.com	Arizona		1
11	Richerd	Gear	57	M	Manager	Management	250000	0	300	Richerd@company.com	Alabama		1
10	Jennifer	Dion	34	F	Sales	Sales	1000	200	150	Jennifer@company.com	Alabama		13
19	George	Joe	50	M	Manager	Management	100000	0	300	George@company.com	Florida	1003 Wyatt Street	1

Tables: employee

That e1.department reference is what makes it correlated. The subquery needs to know which row we’re on.

The Nesting Problem

Subqueries can nest inside each other. And nest again. Technically, SQL allows arbitrary nesting depth. Practically, two levels deep and you’re already in trouble.

SQL

-- Don’t do this
SELECT *
FROM (
  SELECT *
  FROM (
    SELECT
      department,
      AVG(salary) AS avg
    FROM employees
    GROUP BY department
  ) AS level1
  WHERE avg > 50000
) AS level2
WHERE avg < 500000;

This is hard to read, hard to debug, and makes your colleagues hate you. If you ever find yourself nesting more than one level deep, that’s your signal to switch to CTEs.

When to switch to CTEs

One level of nesting is fine. Two levels, consider CTEs. Three levels, definitely use CTEs.

Why CTEs Are Preferred

CTEs (Common Table Expressions) solve the nesting problem. They let you name each step and lay them out top-to-bottom instead of inside-out.

Benefits of CTEs over nested subqueries:

Readable: you can follow the logic from top to bottom
Debuggable: test each step independently by running just that CTE
Reusable: reference the same result multiple times without repeating code
Maintainable: easier to modify six months later

The rest of this module focuses on CTEs because they’re usually the recommended approach. You now understand subqueries well enough to read legacy code and recognize when CTEs would be cleaner.

Table: customers

id	first_name	last_name	city	phone_number
8	John	Joseph	San Francisco	928-386-8164
7	Jill	Michael	Austin	813-297-0692
4	William	Daniel	Denver	813-368-1200
5	Henry	Jackson	Miami	808-601-7513
13	Emma	Isaac	Miami	808-690-5201

Table: orders

id	cust_id	order_date	order_details	total_order_cost
1	3	2019-03-04	Coat	100
2	3	2019-03-01	Shoes	80
3	3	2019-03-07	Skirt	30
4	7	2019-02-01	Coat	25
5	7	2019-03-10	Shoes	80

SQL

SELECT
  c.id,
  c.first_name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.cust_id = c.id
);

Key Takeaways

Subqueries let you use the result of one query inside another
Scalar subqueries return one value; use with =, >, <
IN handles subqueries that return multiple rows
NOT IN fails silently with NULLs; use NOT EXISTS for safety
EXISTS stops at the first match and is NULL-safe
Derived tables (FROM subqueries) must have an alias
Deep nesting is a code smell; CTEs are the cleaner solution

What’s Next

Now that you understand subqueries, you’re ready for CTEs. They use the WITH clause to name each step, making complex queries readable and maintainable. Once you start using them, you’ll wonder how you ever lived without them.

Next upContinue →

CTE Fundamentals

40 min