Module 3: Combining DataFrames35 min

Left and Right Merge

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Inner merge drops unmatched rows — but sometimes the unmatched rows are exactly what you need. “Which customers have never ordered?” requires keeping all customers, even those with no orders. That’s a left merge: keep everything from the left table, and fill in NaN where the right table has no match. This is one of the most common interview questions across both SQL and pandas.

The Merge You’ll Use Most

Inner merge only keeps matches. But most analytics questions need all rows from one side:

  • “Show me all customers, even if they haven’t ordered.”
  • “List every product, including ones never sold.”
  • “Find users who haven’t logged in.”

That’s how="left" territory.

How Left Merge Works

Python
result = pd.merge(
    employees, departments,
    left_on="dept_id", right_on="id",
    how="left"
)
Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201
Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
1
Left Merge Customers with Orders

Left merge customers with orders so every customer appears, even those with no orders.

Tables: customers, orders

The Anti-Join Pattern: Finding What’s Missing

This is probably the single most common pattern in analytics interviews:

  • “Find customers who have never placed an order.”
  • “Which products have never been sold?”

Left merge, then filter for NaN on the right side:

2
Find Customers with No Orders

Left merge customers with orders, then filter to find customers with no matching orders.

Tables: customers, orders
Check the Right-Side Key for NaN

After a left merge, unmatched rows have NaN in all right-side columns. Check the right-side key column (like cust_id) — it’s guaranteed to be non-null for real matches.

indicator=True for Debugging

Add indicator=True to get a _merge column showing where each row came from:

Python
result = pd.merge(
    customers, orders,
    left_on="id", right_on="cust_id",
    how="left", indicator=True
)
result["_merge"].value_counts()

# both        → matched in both
# left_only   → customer with no orders
# right_only  → (not possible in left merge)

Right Merge: Just Swap the Order

how="right" is the mirror image of left merge. In practice, nobody uses it — just swap the DataFrame order and use how="left" instead:

Python
# These are equivalent:
pd.merge(customers, orders, ..., how="right")
pd.merge(orders, customers, ..., how="left")

Posts Reacted to With a Heart

Table: facebook_posts
post_idposterpost_textpost_keywordspost_date
02The Lakers game from last night was great.[basketball,lakers,nba]2019-01-01
11Lebron James is top class.[basketball,lebron_james,nba]2019-01-02
22Asparagus tastes OK.[asparagus,food]2019-01-01
31Spaghetti is an Italian food.[spaghetti,food]2019-01-02
43User 3 is not sharing interests[#spam#]2019-01-01
Table: facebook_reactions
posterfriendreactiondate_daypost_id
21like10
26like10
12like11
13heart11
14like11
3
Find all posts which were reacted to with a heart
View solution

Find all posts which were reacted to with a heart. For such posts output all columns from facebook_posts table.

Tables: facebook_reactions, facebook_posts

Number Of Units Per Nationality

Table: airbnb_hosts
host_idnationalitygenderage
0USAM28
1USAF29
2ChinaF31
3ChinaM24
4MaliM30
Table: airbnb_units
host_idunit_idunit_typen_bedsn_bedroomscountrycity
0A1Room11USANew York
0A2Room11USANew Jersey
0A3Room11USANew Jersey
1A4Apartment21USAHouston
1A5Apartment21USALas Vegas
4
Number Of Units Per Nationality
View solution

Write a query that returns how many different apartment-type units (counted by distinct `unit_id`) are owned by people under 30, grouped by their nationality. Sort the results by the number of apartments in descending order.

Tables: airbnb_hosts, airbnb_units

Matching Similar Hosts and Guests

Table: airbnb_hosts
host_idnationalitygenderage
0USAM28
1USAF29
2ChinaF31
3ChinaM24
4MaliM30
Table: airbnb_guests
guest_idnationalitygenderage
0MaliM21
1ChinaF23
2MaliF27
3AustraliaF24
4LuxembourgM19
5
Matching Similar Hosts and Guests
View solution

Find matching hosts and guests pairs in a way that they are both of the same gender and nationality. Output the host id and the guest id of matched pair.

Tables: airbnb_hosts, airbnb_guests

Key Takeaways

  • how="left" keeps all rows from the left DataFrame, filling NaN for unmatched.
  • Anti-join pattern: left merge + filter for NaN on right-side key.
  • indicator=True adds a _merge column for debugging.
  • Skip how="right" — swap DataFrame order and use left instead.

What’s Next

What if you need unmatched rows from both sides? That’s outer merge — useful for data reconciliation.