Module 6: Window Operations25 min

Ranking

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

The .rank() Method

Ranking within groups is a frequent window function pattern in interviews: "top 3 products per category", "highest earner per department", "most recent order per customer". The .rank() method’s method parameter gives you three different ranking behaviors — each maps to a different SQL window function. Knowing when to use which is crucial.

.rank() assigns ranks. The method parameter controls how ties are handled:

Python
# method="first" = ROW_NUMBER (no ties, by position)
df["row_num"] = df.groupby("dept")["salary"].rank(
    method="first", ascending=False
)

# method="min" = RANK (ties get same rank, gaps after)
df["rank"] = df.groupby("dept")["salary"].rank(
    method="min", ascending=False
)

# method="dense" = DENSE_RANK (ties same, no gaps)
df["dense_rank"] = df.groupby("dept")["salary"].rank(
    method="dense", ascending=False
)

Comparing All Three Methods

Python
df = ms_employee_salary.copy()
df["row_num"] = df.groupby("department_id")["salary"].rank(
    method="first", ascending=False
)
df["rank"] = df.groupby("department_id")["salary"].rank(
    method="min", ascending=False
)
df["dense"] = df.groupby("department_id")["salary"].rank(
    method="dense", ascending=False
)
df.sort_values(["department_id", "salary"], ascending=[True, False])

The Top-N Per Group Pattern

1
Top 2 Per Department

Rank employees by salary within each department, then filter to the top 2.

Tables: ms_employee_salary

Building It Yourself

2
Rank and Filter from Scratch

Find the highest-paid employee in each department. Only one per department.

Tables: ms_employee_salary
nlargest() for Simple Cases

For the top 1 per group, .groupby().apply(lambda g: g.nlargest(1, "salary")) also works. But .rank() + filter is more flexible and closer to SQL’s ROW_NUMBER() pattern.

Ranking Most Active Guests

Table: airbnb_guests
guest_idnationalitygenderage
0MaliM21
1ChinaF23
2MaliF27
3AustraliaF24
4LuxembourgM19
3
Ranking Most Active Guests
View solution

Identify the most engaged guests by ranking them according to their overall messaging activity. The most active guest, meaning the one who has exchanged the most messages with hosts, should have the highest rank. If two or more guests have the same number of messages, they should have the same rank. Importantly, the ranking shouldn't skip any numbers, even if many guests share the same rank. Present your results in a clear format, showing the rank, guest identifier, and total number of messages for each guest, ordered from the most to least active.

Tables: airbnb_contacts

Top Businesses With Most Reviews

Table: yelp_business
business_idnameneighborhoodaddresscitystatepostal_codelatitudelongitudestarsreview_countis_opencategories
G5ERFWvPfHy7IDAUYlWL2AAll Colors Mobile Bumper Repair7137 N 28th AvePhoenixAZ8505133.45-112.07141Auto Detailing;Automotive
0jDvRJS-z9zdMgOUXgr6rASunfare811 W Deer Valley RdPhoenixAZ8502733.68-112.085271Personal Chefs;Food;Gluten-Free;Food Delivery Services;Event Planning & Services;Restaurants
6HmDqeNNZtHMK0t2glF_ggDry Clean VegasSoutheast2550 Windmill Ln, Ste 100Las VegasNV8912336.04-115.12141Dry Cleaning & Laundry;Laundry Services;Local Services;Dry Cleaning
pbt3SBcEmxCfZPdnmU9tNAThe Cuyahoga Room740 Munroe Falls AveCuyahoga FallsOH4422141.14-81.47130Wedding Planning;Caterers;Event Planning & Services;Venues & Event Spaces
CX8pfLn7Bk9o2-8yDMp_2wThe UPS Store4815 E Carefree Hwy, Ste 108Cave CreekAZ8533133.8-111.981.551Notaries;Printing Services;Local Services;Shipping Centers
4
Top Businesses With Most Reviews
View solution

Find the top 5 businesses with most reviews. Assume that each row has a unique business_id such that the total reviews for each business is listed on each row. Output the business name along with the total number of reviews and order your results by the total reviews in descending order. If there are ties in review counts, businesses with the same number of reviews receive the same rank, and subsequent ranks are skipped accordingly (e.g., if two businesses tie for rank 4, the next business receives rank 6, skipping rank 5).

Tables: yelp_business

Second Highest Salary

Table: employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200150Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000300Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000300Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200150Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000300George@company.comFlorida1003 Wyatt Street1
5
Second Highest Salary
View solution

Find the second highest salary of employees.

Tables: employee

Key Takeaways

  • method="first" = ROW_NUMBER (unique ranks).
  • method="min" = RANK (ties share rank, gaps after).
  • method="dense" = DENSE_RANK (ties share rank, no gaps).
  • Top-N per group: rank → filter. A key interview pattern.
  • ascending=False for highest-first ranking.

What’s Next

Ranking tells you position. But what about comparing to the previous or next row? That’s .shift(), the pandas equivalent of LAG/LEAD.