Module 3: Working with Multiple Tables30 min

CROSS JOIN

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in
SQL
SELECT
  a.number AS num_a,
  b.number AS num_b
FROM transportation_numbers a
CROSS JOIN transportation_numbers b
WHERE a.index <= 3
  AND b.index <= 3
ORDER BY a.number, b.number;

With 3 rows on each side, you get 3 × 3 = 9 rows. Every number paired with every number.

Performance danger

CROSS JOIN can destroy your query performance. 1,000 rows × 1,000 rows = 1 million rows. 10,000 × 10,000 = 100 million.

Legitimate Use Cases

CROSS JOIN is rare, but when you need it, nothing else works:

1. Building a Date Scaffold

Say you need a report showing every product’s sales for every day, even days with zero sales. You can’t aggregate what isn’t there. So you build a scaffold:

SQL
-- Create a row for every date x product combination
SELECT
  d.date,
  p.product_name
FROM dates d
CROSS JOIN products p
ORDER BY
  d.date,
  p.product_name;
-- Then LEFT JOIN your actual sales data to this scaffold
-- Days with no sales show up as NULL (or 0 after COALESCE)

This is a common pattern in reporting when you need to show zeros explicitly rather than missing rows.

2. Generating All Attribute Combinations

E-commerce catalog: you have sizes (S, M, L, XL) and colors (Red, Blue, Black). You need a SKU for every combination:

SQL
SELECT
  s.size,
  c.color,
  CONCAT(s.size_code, '-', c.color_code) AS sku
FROM sizes s
CROSS JOIN colors c;

Accidental Cartesian Products

More commonly, you’ll create a Cartesian product by accident. Here’s how it happens:

The Old Comma Syntax

Before ANSI SQL standardized JOIN syntax, people wrote joins like this:

SQL
-- Old style: comma-separated tables
SELECT *
FROM orders, customers
WHERE orders.cust_id = customers.id;

This works fine. But if you forget the WHERE clause:

SQL
-- DISASTER: No WHERE clause = Cartesian product
SELECT *
FROM orders, customers;
-- Returns orders x customers rows
Use explicit JOIN syntax

This is why modern SQL style uses explicit JOIN... ON syntax. It’s harder to accidentally create a Cartesian product when the join condition is required.

Wrong Join Predicates

Even with proper JOIN syntax, a wrong predicate can explode your results:

SQL
-- Intended: match orders to customers by ID
SELECT *
FROM orders o
JOIN customers c
  ON o.order_date = c.signup_date; -- WRONG!
-- This joins on dates, not customer relationship
-- Multiple orders and customers share dates = row explosion

Always double-check that you’re joining on the actual relationship (foreign key to primary key), not just columns that happen to have compatible types.

Diagnosing Row Explosion

When your query returns way more rows than expected, here’s how to debug.

Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201
Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
1
Try Debugging Row Counts

Write a query using UNION ALL to check the row count of each source table, then compare with the JOIN result.

Tables: customers, orders
Row count rule of thumb

An INNER JOIN should return roughly the same number of rows as your fact table (orders), or fewer. If it returns 10x more, something’s wrong with your join logic.

Maximum of Two Numbers

Table: deloitte_numbers
number
-2
-1
0
1
2
2
Maximum of Two Numbers
View solution

Given a single column of numbers, consider all possible permutations of two numbers with replacement, assuming that pairs of numbers (x,y) and (y,x) are two different permutations. Then, for each permutation, find the maximum of the two numbers. Output three columns: the first number, the second number and the maximum of the two.

Tables: deloitte_numbers

Key Takeaways

  • CROSS JOIN intentionally produces all combinations (no ON clause)
  • Result size = rows in table1 × rows in table2 (can be huge)
  • Use it for: date scaffolds, attribute combinations, comparison matrices
  • Accidental cross joins come from: comma syntax without WHERE, wrong join predicates
  • Always sanity-check row counts when results seem off

What’s Next

Sometimes you need to join a table to itself. Finding employee-manager pairs, comparing rows within the same dataset. That’s the self-join, and it’s more useful than it might sound.