Self-Merges
Progress Tracking
Log in to save this lesson and continue from where you left off.
Sometimes both sides of a merge are the same table. “Find employees who earn more than someone else in their department” requires comparing each employee against every other employee in their department. That’s a self-merge: merge the table with itself on the grouping column, then filter the pairs. It feels strange the first time, but the pattern is always the same: merge, pair, filter.
When a DataFrame Needs to Talk to Itself
The classic example: the employee-manager relationship. Managers are employees too. To answer “Who manages whom?” you merge the employees DataFrame with itself.
result = pd.merge(
employees, employees,
left_on="manager_id", right_on="id",
how="left",
suffixes=("", "_manager")
)
result[["name", "name_manager"]]We merge the same DataFrame twice, using suffixes to tell the columns apart. how="left" because the CEO has no manager.
Comparing Rows Within a DataFrame
Self-merges let you compare rows to other rows:
| id | first_name | last_name | salary | joining_date | department |
|---|---|---|---|---|---|
| 1001 | Monika | Arora | 100000 | 2023-01-12 | HR |
| 1002 | Niharika | Verma | 80000 | 2023-01-05 | Admin |
| 1003 | Vishal | Singhal | 300000 | 2023-01-02 | HR |
| 1004 | Amitah | Singh | 500000 | 2022-12-15 | Admin |
| 1005 | Vivek | Bhati | 500000 | 2022-11-16 | Admin |
Find employees who earn more than a colleague in the same department.
Without a scope constraint, you’re comparing n×n pairs. On 10,000 employees, that’s 100 million comparisons. Always merge on a scoping column like department first.
Finding Unique Pairs
To avoid duplicate pairs (Alice-Bob and Bob-Alice), filter so the left ID is always less than the right:
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Find all pairs of employees in the same department. Avoid duplicates and self-pairs.
Filter with id_a < id_b (not !=) to get each pair exactly once.
Employees With the Same Salary
| worker_id | first_name | last_name | salary | joining_date | department |
|---|---|---|---|---|---|
| 1 | Monika | Arora | 100000 | 2014-02-20 | HR |
| 2 | Niharika | Verma | 80000 | 2014-06-11 | Admin |
| 3 | Vishal | Singhal | 300000 | 2014-02-20 | HR |
| 4 | Amitah | Singh | 500000 | 2014-02-20 | Admin |
| 5 | Vivek | Bhati | 500000 | 2014-06-11 | Admin |
Find employees who earn the same salary. Output the worker id along with the first name and the salary in descending order.
Matching Users Pairs
| id | location | age | gender | is_senior |
|---|---|---|---|---|
| 0 | USA | 24 | M | FALSE |
| 1 | USA | 31 | F | TRUE |
| 2 | USA | 29 | F | FALSE |
| 3 | USA | 33 | M | FALSE |
| 4 | USA | 36 | F | TRUE |
Find matching pairs of Meta/Facebook employees such that they are both of the same nation, different age, same gender, and at different seniority levels. Output ids of paired employees.
Key Takeaways
- Self-merge:
pd.merge(df, df, ...)withsuffixesto distinguish columns. - Essential for hierarchies, row comparisons, and finding pairs.
- Use
id_a < id_bto avoid duplicate and self pairs. - Always scope with a shared column (department, category) to avoid n×n explosions.
What’s Next
You’ve mastered all merge types. Time to chain them: merging three, four, or more DataFrames in a single analysis.