Module 4: Subqueries & CTEs35 min

Understanding Subqueries

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

When One Query Isn’t Enough

Here’s a question that comes up all the time: “Find everyone who earns more than average.” Sounds simple, but think about it. You need to know the average before you can filter by it. That’s two questions, not one.

You could run two queries. First, get the average, write it down, then use that number in your WHERE clause. But that’s tedious, error-prone, and falls apart when the data changes. Subqueries let you do both in one shot.

A subquery is just a query inside another query. The inner one runs first, and its result gets plugged into the outer one. This module teaches you both subqueries and CTEs (Common Table Expressions). We’ll start with subqueries because you’ll encounter them constantly in existing code, then move to CTEs, which are StrataScratch’s recommended approach for new queries.

Subqueries in WHERE Clause

Your First Subquery

SQL
SELECT *
FROM orders
WHERE total_order_cost > (
  SELECT AVG(total_order_cost) FROM orders
);
Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05
1
Find Above-Average Earners

Find employees earning above the company average from `techcorp_workforce`.

Tables: techcorp_workforce

The inner query calculates the average salary. The outer query uses it to filter. No hardcoded numbers, no two-step process.

The Classic “Find the Max” Pattern

This comes up constantly in interviews: find the row with the highest value for a given column.

SQL
SELECT
  first_name,
  last_name,
  salary
FROM employees
WHERE salary = (
  SELECT MAX(salary)
  FROM employees
);

The subquery finds the highest salary. The outer query finds everyone earning that amount. If there’s a tie, you get multiple rows.

Workers With The Highest Salaries

Combine subqueries with JOINs to find top performers.

Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20HR
2NiharikaVerma800002014-06-11Admin
3VishalSinghal3000002014-02-20HR
4AmitahSingh5000002014-02-20Admin
5VivekBhati5000002014-06-11Admin
Table: title
worker_ref_idworker_titleaffected_from
1Manager2016-02-20
2Executive2016-06-11
8Executive2016-06-11
5Manager2016-06-11
4Asst. Manager2016-06-11
2
Workers With The Highest Salaries
View solution

Management wants to analyze only employees with official job titles. Find the job titles of the employees with the highest salary. If multiple employees have the same highest salary, include all their job titles.

Tables: worker, title

IN: Matching a List

What if your subquery returns multiple rows? That’s when you use IN.

SQL
-- Find employees in departments that have high earners
SELECT
  first_name,
  last_name,
  department
FROM employees
WHERE department IN (
  SELECT DISTINCT department
  FROM employees
  WHERE salary > 100000
);

NOT IN: The Trap Everyone Falls Into

NOT IN seems like the obvious way to exclude values. And it works great, until it doesn’t.

Here’s the problem: if that subquery returns even one NULL value, NOT IN returns zero rows. Not some rows. Zero. It’s bitten every SQL developer at least once.

NOT IN with NULLs returns nothing

This is one of SQL’s most infamous gotchas. NOT IN with a NULL in the list returns nothing, always. The fix is to use NOT EXISTS instead.

EXISTS and NOT EXISTS

EXISTS asks a simple question: “Does at least one matching row exist?” It returns TRUE or FALSE, and stops searching the moment it finds a match.

SQL
-- Find customers who have placed at least one order
SELECT
  c.customer_id,
  c.customer_name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.cust_id = c.customer_id
);

That SELECT 1 isn’t magic. EXISTS only cares whether rows exist, not what’s in them.

NOT EXISTS: The NULL-Safe Alternative

NOT EXISTS finds rows where no matching rows exist. It’s cleaner and safer than NOT IN.

SQL
-- Find customers who have never ordered
SELECT
  c.customer_id,
  c.customer_name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.cust_id = c.customer_id
);
EXISTS is efficient

EXISTS stops searching as soon as it finds a match. For large tables, this can be much faster than IN.

Customers with Large Orders

Use EXISTS with a correlated subquery that checks for orders over $100.

Table: online_store_customers
customer_idcustomer_name
1Alice Johnson
2Bob Smith
3Carol Williams
4David Brown
5Emma Davis
Table: online_store_orders
order_idcustomer_idamountstatus
1011150paid
1021200paid
103175paid
1042250paid
1053180paid
3
Customers with Large Orders
View solution

The marketing team wants to identify high-value customers for a premium loyalty program. Find all customers who have placed at least one order over $100. Return `customer ID` and `name`.

Tables: online_store_customers, online_store_orders

Subqueries in SELECT

You can put a subquery in SELECT to add a calculated column to every row.

Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201
Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
4
Try a Subquery in SELECT

Add a correlated subquery that counts orders for each customer. Alias it as order_count.

Tables: customers, orders

This counts orders for each customer. The subquery runs once per row. For small tables, that’s fine. For big tables, it can get slow.

Scalar subqueries must return one value

Scalar subqueries in SELECT must return exactly one value. If they return multiple rows, you get an error.

Derived Tables: Subqueries in FROM

Put a subquery in FROM to create a temporary result set, then query it like a regular table.

SQL
SELECT
  department,
  total_salary
FROM (
  SELECT
    department,
    SUM(salary) AS total_salary
  FROM employees
  GROUP BY department
) AS dept_totals
WHERE total_salary > 200000;

The inner query aggregates. The outer query filters those aggregates.

Derived tables need an alias

Derived tables must have an alias. That AS dept_totals isn’t optional.

Correlated Subqueries

The subqueries above can run independently. Correlated subqueries are different: they reference the outer query and run once per row.

Table: employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200150Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000300Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000300Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200150Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000300George@company.comFlorida1003 Wyatt Street1
5
Try a Correlated Subquery

Find employees who earn above their own department average using a correlated subquery in WHERE.

Tables: employee

That e1.department reference is what makes it correlated. The subquery needs to know which row we’re on.

The Nesting Problem

Subqueries can nest inside each other. And nest again. Technically, SQL allows arbitrary nesting depth. Practically, two levels deep and you’re already in trouble.

SQL
-- Don’t do this
SELECT *
FROM (
  SELECT *
  FROM (
    SELECT
      department,
      AVG(salary) AS avg
    FROM employees
    GROUP BY department
  ) AS level1
  WHERE avg > 50000
) AS level2
WHERE avg < 500000;

This is hard to read, hard to debug, and makes your colleagues hate you. If you ever find yourself nesting more than one level deep, that’s your signal to switch to CTEs.

When to switch to CTEs

One level of nesting is fine. Two levels, consider CTEs. Three levels, definitely use CTEs.

Why CTEs Are Preferred

CTEs (Common Table Expressions) solve the nesting problem. They let you name each step and lay them out top-to-bottom instead of inside-out.

Benefits of CTEs over nested subqueries:

  • Readable: you can follow the logic from top to bottom
  • Debuggable: test each step independently by running just that CTE
  • Reusable: reference the same result multiple times without repeating code
  • Maintainable: easier to modify six months later

The rest of this module focuses on CTEs because they’re usually the recommended approach. You now understand subqueries well enough to read legacy code and recognize when CTEs would be cleaner.

Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201
Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
SQL
SELECT
  c.id,
  c.first_name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.cust_id = c.id
);

Key Takeaways

  • Subqueries let you use the result of one query inside another
  • Scalar subqueries return one value; use with =, >, <
  • IN handles subqueries that return multiple rows
  • NOT IN fails silently with NULLs; use NOT EXISTS for safety
  • EXISTS stops at the first match and is NULL-safe
  • Derived tables (FROM subqueries) must have an alias
  • Deep nesting is a code smell; CTEs are the cleaner solution

What’s Next

Now that you understand subqueries, you’re ready for CTEs. They use the WITH clause to name each step, making complex queries readable and maintainable. Once you start using them, you’ll wonder how you ever lived without them.