Comparing Adjacent Rows
Progress Tracking
Log in to save this lesson and continue from where you left off.
Accessing Previous Rows with .shift()
df = orders.sort_values("order_date")
df["prev_date"] = df["order_date"].shift(1)
df[["order_date", "prev_date"]].head()| id | user_id | item | created_at | revenue |
|---|---|---|---|---|
| 1 | 109 | milk | 2020-03-03 | 123 |
| 2 | 139 | biscuit | 2020-03-18 | 421 |
| 3 | 120 | milk | 2020-03-18 | 176 |
| 4 | 108 | banana | 2020-03-18 | 862 |
| 5 | 130 | milk | 2020-03-28 | 333 |
Sort by user and date, shift to get the previous purchase date, then calculate the gap in days.
.shift() operates on row position. If your data isn’t sorted, you’ll compare to the wrong row. Always .sort_values() before .shift().
Calculating Differences
Now extend the pattern: calculate the gap, then use it to flag interesting rows.
Calculate days between purchases per user, then flag rows where the gap is under 7 days.
Finding Purchases
| id | user_id | item | created_at | revenue |
|---|---|---|---|---|
| 1 | 109 | milk | 2020-03-03 | 123 |
| 2 | 139 | biscuit | 2020-03-18 | 421 |
| 3 | 120 | milk | 2020-03-18 | 176 |
| 4 | 108 | banana | 2020-03-18 | 862 |
| 5 | 130 | milk | 2020-03-28 | 333 |
Identify returning active users by finding users who made a repeat purchase within 7 days or less of their previous transaction, excluding same-day purchases. Output a list of these `user_id`.
Monthly Percentage Difference
| id | created_at | value | purchase_id |
|---|---|---|---|
| 1 | 2019-01-01 | 172692 | 43 |
| 2 | 2019-01-05 | 177194 | 36 |
| 3 | 2019-01-09 | 109513 | 30 |
| 4 | 2019-01-13 | 164911 | 30 |
| 5 | 2019-01-17 | 198872 | 39 |
Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year. The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month's revenue - last month's revenue) / last month's revenue)*100.
Key Takeaways
.shift(1)= LAG (previous row);.shift(-1)= LEAD (next row).- Always sort before shifting — position-based, not value-based.
- Use within
.groupby()to shift within groups:df.groupby("col")["val"].shift(1). - First row in each group gets
NaN(no previous row exists). - Subtract shifted values for period-over-period change.
What’s Next
Next: running totals, cumulative counts, and percentage-of-total calculations using .cumsum(), .expanding(), and .rolling().