Module 3: Combining DataFrames10 min

Cross Merge

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

A cross merge pairs every row from the left with every row from the right. If left has 5 rows and right has 10, you get 50 rows. This sounds useless until you need it: generating all possible product-store combinations for inventory planning, creating a grid of dates and categories for a report template, or building a comparison matrix. It’s rare but powerful.

Python
small = transportation_numbers[transportation_numbers.index < 3]
pd.merge(small, small, how="cross")

With 3 rows on each side, you get 3 × 3 = 9 rows.

Performance Danger

how="cross" can destroy your performance. 1,000 rows × 1,000 rows = 1 million rows. Only use it on small DataFrames or with a post-merge filter.

Legitimate Use Cases

Cross merge is rare but useful for:

  • Date scaffolds: every date × every product (then left merge actual sales)
  • Attribute combinations: every size × every color for SKU generation
  • Comparison matrices: every pair of items for similarity scoring

Maximum of Two Numbers

Table: deloitte_numbers
number
-2
-1
0
1
2
1
Maximum of Two Numbers
View solution

Given a single column of numbers, consider all possible permutations of two numbers with replacement, assuming that pairs of numbers (x,y) and (y,x) are two different permutations. Then, for each permutation, find the maximum of the two numbers. Output three columns: the first number, the second number and the maximum of the two.

Tables: deloitte_numbers

Key Takeaways

  • pd.merge(df_a, df_b, how="cross") produces all combinations.
  • Result size = rows in A × rows in B.
  • Use for scaffolds and attribute combinations, never on large DataFrames without filtering.

What’s Next

Sometimes you need to merge a DataFrame with itself. Finding employee-manager pairs, comparing rows within the same dataset — that’s the self-merge.