Module 6: Window Operations25 min

Advanced Window Patterns

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Deduplication with Ranking

Data deduplication is a frequent real-world use of ranking. Your database has multiple versions of the same record (address changed, salary updated, status modified), and you need only the latest. The pattern: rank by date within each entity, keep rank 1. This works on any "keep the most recent" problem, and it’s cleaner than the groupby-max-then-merge alternative.

Python
# Rank by date within each entity, keep rank 1
df["rnk"] = df.groupby("entity_id")["updated_at"].rank(
    method="first", ascending=False
)
latest = df[df["rnk"] == 1].drop(columns="rnk")

Smoothing with .rolling()

1
Add a Rolling Average

The starter builds the pipeline. Add a 2-purchase rolling average of days gap per user. Output `user_id`, `created_at`, `days_gap`, and `avg_gap`.

Tables: amazon_transactions

Combining Everything

2
Full Analysis Pipeline

For each user: rank purchases by date, calculate days since previous purchase, and flag purchases with gaps over 5 days.

Tables: amazon_transactions

Rank Variance Per Country

Table: hotel_reviews
hotel_addressadditional_number_of_scoringreview_dateaverage_scorehotel_namereviewer_nationalitynegative_reviewreview_total_negative_word_countstotal_number_of_reviewspositive_reviewreview_total_positive_word_countstotal_number_of_reviews_reviewer_has_givenreviewer_scoretagsdays_since_reviewlatlng
7 Western Gateway Royal Victoria Dock Newham London E16 1AA United Kingdom3592017-07-058.5Novotel London ExcelUnited Kingdomcoffee and tea at breakfast were not particularly hot Otherwise everything else was fine161158we were allocated the newly refurbished rooms and so everything was fresh and the bed was very comfortable the hotel is ideally situated near City Airport although eventually we travelled by train34210[' Leisure trip ', ' Family with young children ', ' Standard Double Room with Two Single Beds ', ' Stayed 2 nights ', ' Submitted from a mobile device ']29 days51.510.02
35 Charles Street Mayfair Westminster Borough London W1J 5EB United Kingdom2522015-08-299.1The Chesterfield MayfairIsraelNo Negative01166We liked everything The hotel is simply a boutique the staff were all polite and helpfull The room was clean and been serviced daily Wifi was completely free Breakfast was simply great I so much want to get back41810[' Leisure trip ', ' Couple ', ' Classic Double Room ', ' Stayed 4 nights ', ' Submitted from a mobile device ']705 day51.51-0.15
14 Rue Stanislas 6th arr 75006 Paris France402017-05-239.1Hotel Le SixUnited States of AmericaThere is currently utility construction taking place on the street in front of the hotel so a little noisy at times and barriers in place27177Neat boutique hotel Some of the most comfortable hotel beds I have ever come across Staff was wonderful Loved the location Not too touristy Luxembourg gardens close by and a great place for a morning run walk3939.2[' Leisure trip ', ' Family with young children ', ' Deluxe Double Room ', ' Stayed 4 nights ', ' Submitted from a mobile device ']72 days48.842.33
Gran V a De Les Corts Catalanes 570 Eixample 08011 Barcelona Spain3252016-08-258.2Sunotel CentralUnited KingdomCoffee at breakfast could be better When you spend this amount in a hotel I expect better coffee in the morning223870Great bed nice to have a coffee machine in the room love the air conditioning and basically loved the attitude of the staff Really great2629.2[' Leisure trip ', ' Group ', ' Comfort Double or Twin Room ', ' Stayed 1 night ', ' Submitted from a mobile device ']343 day41.382.16
Rathausstra e 17 01 Innere Stadt 1010 Vienna Austria1952015-09-178.5Austria Trend Hotel Rathauspark WienUnited KingdomA bit out of the way location wise91884Clean modern rooms and bathroom well equipped927.5[' Leisure trip ', ' Couple ', ' Comfort Room ', ' Stayed 2 nights ', ' Submitted from a mobile device ']686 day48.2116.36
3
Rank Variance Per Country
View solution

Compare the total number of comments made by users in each country during December 2019 and January 2020. For each month, rank countries by their total number of comments in descending order. Countries with the same total should share the same rank, and the next rank should increase by one (without skipping numbers). Return the names of the countries whose rank improved from December to January (that is, their rank number became smaller).

Tables: fb_comments_count, fb_active_users

Best Selling Item

Table: online_orders
product_idpromotion_idcost_in_dollarscustomer_iddate_soldunits_sold
11212022-04-014
33632022-05-246
122102022-05-013
12322022-05-019
221022022-05-011
4
Best Selling Item
View solution

Find the best-selling item for each month (no need to separate months by year). The best-selling item is determined by the highest total sales amount, calculated as: `total_paid = unitprice * quantity`. A negative `quantity` indicates a return or cancellation (the invoice number begins with `'C'`. To calculate sales, ignore returns and cancellations. Output the month, description of the item, and the total amount paid.

Tables: online_retail

Consecutive Days

Table: sf_events
record_dateaccount_iduser_id
2021-01-01A1U1
2021-01-01A1U2
2021-01-06A1U3
2021-01-02A1U1
2020-12-24A1U2
5
Consecutive Days
View solution

Find all the users who were active for 3 consecutive days or more.

Tables: sf_events

Key Takeaways

  • Deduplication: rank by date within groups, keep rank 1.
  • Chain techniques: sort → rank → shift → cumsum → flag.
  • .rolling(n) for moving averages within groups (watch the MultiIndex).
  • Always sort before any positional operation.

Your learning journey starts here

Complete lessons to track your progress through the path.

0%

What You Can Do Now

  • Filter, sort, and aggregate data across grouped categories
  • Merge multiple DataFrames to answer cross-table questions
  • Clean messy strings, extract date parts, and apply custom logic
  • Compare rows to their group averages and their neighbors
  • Build ranked leaderboards, running totals, and period-over-period reports
  • Chain multi-step analysis pipelines from filter to final output

Where to Go from Here

  • Practice is what turns knowledge into fluency.StrataScratch has hundreds of pandas questions from real company interviews — start with the ones tagged at your level and work up.
  • If you haven’t already, try the SQL learning path as well. Most data roles expect both, and the concepts map closely: groupby is GROUP BY, merge is JOIN, transform is a window function. Knowing both makes you faster in each.
  • The best next step is a real project. StrataScratch Data Projects give you guided, end-to-end analyses on real datasets — pick one that interests you and put your skills to work. That’s where learning becomes craft.
Next up
You're all caught up