Left and Right Merge
Progress Tracking
Log in to save this lesson and continue from where you left off.
Inner merge drops unmatched rows — but sometimes the unmatched rows are exactly what you need. “Which customers have never ordered?” requires keeping all customers, even those with no orders. That’s a left merge: keep everything from the left table, and fill in NaN where the right table has no match. This is one of the most common interview questions across both SQL and pandas.
The Merge You’ll Use Most
Inner merge only keeps matches. But most analytics questions need all rows from one side:
- “Show me all customers, even if they haven’t ordered.”
- “List every product, including ones never sold.”
- “Find users who haven’t logged in.”
That’s how="left" territory.
How Left Merge Works
result = pd.merge(
employees, departments,
left_on="dept_id", right_on="id",
how="left"
)| id | first_name | last_name | city | address | phone_number |
|---|---|---|---|---|---|
| 8 | John | Joseph | San Francisco | 928-386-8164 | |
| 7 | Jill | Michael | Austin | 813-297-0692 | |
| 4 | William | Daniel | Denver | 813-368-1200 | |
| 5 | Henry | Jackson | Miami | 808-601-7513 | |
| 13 | Emma | Isaac | Miami | 808-690-5201 |
| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
Left merge customers with orders so every customer appears, even those with no orders.
The Anti-Join Pattern: Finding What’s Missing
This is probably the single most common pattern in analytics interviews:
- “Find customers who have never placed an order.”
- “Which products have never been sold?”
Left merge, then filter for NaN on the right side:
Left merge customers with orders, then filter to find customers with no matching orders.
After a left merge, unmatched rows have NaN in all right-side columns. Check the right-side key column (like cust_id) — it’s guaranteed to be non-null for real matches.
indicator=True for Debugging
Add indicator=True to get a _merge column showing where each row came from:
result = pd.merge(
customers, orders,
left_on="id", right_on="cust_id",
how="left", indicator=True
)
result["_merge"].value_counts()
# both → matched in both
# left_only → customer with no orders
# right_only → (not possible in left merge)Right Merge: Just Swap the Order
how="right" is the mirror image of left merge. In practice, nobody uses it — just swap the DataFrame order and use how="left" instead:
# These are equivalent:
pd.merge(customers, orders, ..., how="right")
pd.merge(orders, customers, ..., how="left")Posts Reacted to With a Heart
| post_id | poster | post_text | post_keywords | post_date |
|---|---|---|---|---|
| 0 | 2 | The Lakers game from last night was great. | [basketball,lakers,nba] | 2019-01-01 |
| 1 | 1 | Lebron James is top class. | [basketball,lebron_james,nba] | 2019-01-02 |
| 2 | 2 | Asparagus tastes OK. | [asparagus,food] | 2019-01-01 |
| 3 | 1 | Spaghetti is an Italian food. | [spaghetti,food] | 2019-01-02 |
| 4 | 3 | User 3 is not sharing interests | [#spam#] | 2019-01-01 |
| poster | friend | reaction | date_day | post_id |
|---|---|---|---|---|
| 2 | 1 | like | 1 | 0 |
| 2 | 6 | like | 1 | 0 |
| 1 | 2 | like | 1 | 1 |
| 1 | 3 | heart | 1 | 1 |
| 1 | 4 | like | 1 | 1 |
Find all posts which were reacted to with a heart. For such posts output all columns from facebook_posts table.
Number Of Units Per Nationality
| host_id | nationality | gender | age |
|---|---|---|---|
| 0 | USA | M | 28 |
| 1 | USA | F | 29 |
| 2 | China | F | 31 |
| 3 | China | M | 24 |
| 4 | Mali | M | 30 |
| host_id | unit_id | unit_type | n_beds | n_bedrooms | country | city |
|---|---|---|---|---|---|---|
| 0 | A1 | Room | 1 | 1 | USA | New York |
| 0 | A2 | Room | 1 | 1 | USA | New Jersey |
| 0 | A3 | Room | 1 | 1 | USA | New Jersey |
| 1 | A4 | Apartment | 2 | 1 | USA | Houston |
| 1 | A5 | Apartment | 2 | 1 | USA | Las Vegas |
Write a query that returns how many different apartment-type units (counted by distinct `unit_id`) are owned by people under 30, grouped by their nationality. Sort the results by the number of apartments in descending order.
Matching Similar Hosts and Guests
| host_id | nationality | gender | age |
|---|---|---|---|
| 0 | USA | M | 28 |
| 1 | USA | F | 29 |
| 2 | China | F | 31 |
| 3 | China | M | 24 |
| 4 | Mali | M | 30 |
| guest_id | nationality | gender | age |
|---|---|---|---|
| 0 | Mali | M | 21 |
| 1 | China | F | 23 |
| 2 | Mali | F | 27 |
| 3 | Australia | F | 24 |
| 4 | Luxembourg | M | 19 |
Find matching hosts and guests pairs in a way that they are both of the same gender and nationality. Output the host id and the guest id of matched pair.
Key Takeaways
how="left"keeps all rows from the left DataFrame, fillingNaNfor unmatched.- Anti-join pattern: left merge + filter for
NaNon right-side key. indicator=Trueadds a_mergecolumn for debugging.- Skip
how="right"— swap DataFrame order and use left instead.
What’s Next
What if you need unmatched rows from both sides? That’s outer merge — useful for data reconciliation.