Module 3: Combining DataFrames•10 min

Cross Merge

Progress Tracking

A cross merge pairs every row from the left with every row from the right. If left has 5 rows and right has 10, you get 50 rows. This sounds useless until you need it: generating all possible product-store combinations for inventory planning, creating a grid of dates and categories for a report template, or building a comparison matrix. It’s rare but powerful.

Python

small = transportation_numbers[transportation_numbers.index < 3]
pd.merge(small, small, how="cross")

With 3 rows on each side, you get 3 × 3 = 9 rows.

Performance Danger

how="cross" can destroy your performance. 1,000 rows × 1,000 rows = 1 million rows. Only use it on small DataFrames or with a post-merge filter.

Legitimate Use Cases

Cross merge is rare but useful for:

Date scaffolds: every date × every product (then left merge actual sales)
Attribute combinations: every size × every color for SKU generation
Comparison matrices: every pair of items for similarity scoring

Maximum of Two Numbers

Table: deloitte_numbers

number
-2
-1
0
1
2

Tables: deloitte_numbers

Key Takeaways

pd.merge(df_a, df_b, how="cross") produces all combinations.
Result size = rows in A × rows in B.
Use for scaffolds and attribute combinations, never on large DataFrames without filtering.

What’s Next

Sometimes you need to merge a DataFrame with itself. Finding employee-manager pairs, comparing rows within the same dataset — that’s the self-merge.

Next upContinue →

Self-Merges

25 min