Inner Merge
Progress Tracking
Log in to save this lesson and continue from where you left off.
An inner merge is the workhorse of data analysis. You have orders in one table and customer details in another — to see “who bought what,” you merge them. Inner merge keeps only the rows that match in both tables. If a customer has no orders, they disappear. If an order has no matching customer, it disappears too. This “both sides must match” behavior is exactly what you want most of the time.
Where the Real Work Begins
Almost every meaningful analysis requires combining data from multiple DataFrames:
- “Show me orders with customer names.”
- “Get products with their category labels.”
- “Find employees with their department info.”
All of these need pd.merge().
The Basic Pattern
result = pd.merge(left_df, right_df, left_on="fk", right_on="pk")left_on and right_on specify which columns to match. When both DataFrames use the same column name, use on= instead:
result = pd.merge(df_a, df_b, on="shared_column")Your First Merge
result = pd.merge(
employees, departments,
left_on="dept_id", right_on="id"
)
result[["first_name", "dept_name", "salary"]]| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
| id | first_name | last_name | city | address | phone_number |
|---|---|---|---|---|---|
| 8 | John | Joseph | San Francisco | 928-386-8164 | |
| 7 | Jill | Michael | Austin | 813-297-0692 | |
| 4 | William | Daniel | Denver | 813-368-1200 | |
| 5 | Henry | Jackson | Miami | 808-601-7513 | |
| 13 | Emma | Isaac | Miami | 808-690-5201 |
Merge orders with customers to see who placed each order. Match on the customer ID columns.
When both DataFrames have a column with the same name (like id), pandas adds _x and _y suffixes. Control this with suffixes=("_order", "_customer").
Filtering After Merge
Merge first, then filter the combined result:
Merge orders with customers, then filter to orders over 100.
The Aggregation Trap
When merging and then aggregating, group by the unique key (id), never just by name. Different people can share the same name.
# Wrong: two customers named 'John' get merged
merged.groupby('first_name')['total_order_cost'].sum()
# Correct: group by the unique ID
merged.groupby(['id_y', 'first_name'])['total_order_cost'].sum()After merging, each order row gets customer info attached. If you .count() the customer ID column, you’re counting orders, not customers. Always think about what the rows represent after the merge.
Total Cost Of Orders
| id | first_name | last_name | city | address | phone_number |
|---|---|---|---|---|---|
| 8 | John | Joseph | San Francisco | 928-386-8164 | |
| 7 | Jill | Michael | Austin | 813-297-0692 | |
| 4 | William | Daniel | Denver | 813-368-1200 | |
| 5 | Henry | Jackson | Miami | 808-601-7513 | |
| 13 | Emma | Isaac | Miami | 808-690-5201 |
| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
Find the total cost of each customer's orders. Output customer's id, first name, and the total order cost. Order records by customer's first name alphabetically.
Order Details
| id | first_name | last_name | city | address | phone_number |
|---|---|---|---|---|---|
| 8 | John | Joseph | San Francisco | 928-386-8164 | |
| 7 | Jill | Michael | Austin | 813-297-0692 | |
| 4 | William | Daniel | Denver | 813-368-1200 | |
| 5 | Henry | Jackson | Miami | 808-601-7513 | |
| 13 | Emma | Isaac | Miami | 808-690-5201 |
| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
Find order details made by Jill and Eva. Consider the Jill and Eva as first names of customers. Output the order date, details and cost along with the first name. Order records based on the customer id in ascending order.
Key Takeaways
pd.merge(left, right, left_on=, right_on=)joins on different column names.pd.merge(left, right, on=)when both use the same column name.- Inner merge (default) keeps only matching rows.
- Use
suffixes=to control overlapping column names. - When aggregating merged data, group by unique keys, not names.
What’s Next
Inner merge only keeps matches. But what about “Show me all customers, including those who haven’t ordered”? That’s a left merge, and it’s probably the most important merge type for analytics.