Module 6: Window Operations20 min

Comparing Adjacent Rows

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Accessing Previous Rows with .shift()

Python
df = orders.sort_values("order_date")
df["prev_date"] = df["order_date"].shift(1)
df[["order_date", "prev_date"]].head()
Table: amazon_transactions
iduser_iditemcreated_atrevenue
1109milk2020-03-03123
2139biscuit2020-03-18421
3120milk2020-03-18176
4108banana2020-03-18862
5130milk2020-03-28333
1
Calculate Days Between Purchases

Sort by user and date, shift to get the previous purchase date, then calculate the gap in days.

Tables: amazon_transactions
Sort First!

.shift() operates on row position. If your data isn’t sorted, you’ll compare to the wrong row. Always .sort_values() before .shift().

Calculating Differences

Now extend the pattern: calculate the gap, then use it to flag interesting rows.

2
Flag Rapid Repurchases

Calculate days between purchases per user, then flag rows where the gap is under 7 days.

Tables: amazon_transactions

Finding Purchases

Table: amazon_transactions
iduser_iditemcreated_atrevenue
1109milk2020-03-03123
2139biscuit2020-03-18421
3120milk2020-03-18176
4108banana2020-03-18862
5130milk2020-03-28333
3
Finding Purchases
View solution

Identify returning active users by finding users who made a repeat purchase within 7 days or less of their previous transaction, excluding same-day purchases. Output a list of these `user_id`.

Tables: amazon_transactions

Monthly Percentage Difference

Table: sf_transactions
idcreated_atvaluepurchase_id
12019-01-0117269243
22019-01-0517719436
32019-01-0910951330
42019-01-1316491130
52019-01-1719887239
4
Monthly Percentage Difference
View solution

Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year. The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month's revenue - last month's revenue) / last month's revenue)*100.

Tables: sf_transactions

Key Takeaways

  • .shift(1) = LAG (previous row); .shift(-1) = LEAD (next row).
  • Always sort before shifting — position-based, not value-based.
  • Use within .groupby() to shift within groups: df.groupby("col")["val"].shift(1).
  • First row in each group gets NaN (no previous row exists).
  • Subtract shifted values for period-over-period change.

What’s Next

Next: running totals, cumulative counts, and percentage-of-total calculations using .cumsum(), .expanding(), and .rolling().