Date Arithmetic and Truncation
Progress Tracking
Log in to save this lesson and continue from where you left off.
Adding Time to Dates
Subtracting two dates is how you calculate durations — days since purchase, time between events, ticket age. Adding time is how you compute deadlines and future dates. pandas uses pd.Timedelta for exact durations and pd.DateOffset for calendar-aware shifts.
worker["joining_date"] = pd.to_datetime(worker["joining_date"])
worker["probation_end"] = worker["joining_date"] + pd.Timedelta(days=90)
worker["anniversary"] = worker["joining_date"] + pd.DateOffset(years=1)
worker[["first_name", "joining_date", "probation_end", "anniversary"]]pd.Timedelta(days=7) for fixed durations (days, hours, seconds). pd.DateOffset(months=1) for calendar-aware durations (months, years) that handle varying month lengths.
Calculating Date Differences
Subtracting two dates is how you calculate durations — days since purchase, time between events, ticket age.
Subtracting two datetime columns gives a Timedelta. Use .dt.days to get an integer:
Calculate how many days each employee has worked. The starter converts the column — add the subtraction.
Truncating to Periods
Time-series analysis almost always starts with truncation: group daily data into weekly or monthly buckets. Without truncation, you’d have one row per exact timestamp, which is useless for trend analysis. .dt.to_period() is how you go from "March 15, 2024" to just "March 2024" — turning individual events into meaningful time buckets.
For time-series grouping, truncate dates to the start of a period with .dt.to_period() or direct formatting:
# Truncate to month (returns Period)
orders["month"] = orders["order_date"].dt.to_period("M")
# Or truncate to first of month (returns Timestamp)
orders["month_start"] = (
orders["order_date"].dt.to_period("M").dt.to_timestamp()
)Grouping by Period
Group employees by the month they joined and count hires per month.
.dt.to_period("M") returns a Period object (like "2024-03"). Some operations need Timestamps instead — convert back with .dt.to_timestamp(). For groupby, Period works fine.
Number of Shipments Per Month
| shipment_id | sub_id | weight | shipment_date |
|---|---|---|---|
| 101 | 1 | 10 | 2021-08-30 |
| 101 | 2 | 20 | 2021-09-01 |
| 101 | 3 | 10 | 2021-09-05 |
| 102 | 1 | 50 | 2021-09-02 |
| 103 | 1 | 25 | 2021-09-01 |
Write a query that will calculate the number of shipments per month. The unique key for one shipment is a combination of `shipment_id` and `sub_id`. Output the `year_month` in format `YYYY-MM` and the number of shipments in that month.
Users By Average Session Time
| user_id | timestamp | action |
|---|---|---|
| 0 | 2019-04-25 13:30:15 | page_load |
| 0 | 2019-04-25 13:30:18 | page_load |
| 0 | 2019-04-25 13:30:40 | scroll_down |
| 0 | 2019-04-25 13:30:45 | scroll_up |
| 0 | 2019-04-25 13:31:10 | scroll_down |
Calculate each user's average session time, where a session is defined as the time difference between a `page_load` and a `page_exit`. Assume each user has only one session per day. If there are multiple `page_load` or `page_exit` events on the same day, use only the latest `page_load` and the earliest `page_exit`. Only consider sessions where the `page_load` occurs before the `page_exit` on the same day. Output the `user_id` and their average session time.
Key Takeaways
pd.Timedelta(days=n)for fixed durations;pd.DateOffset(months=n)for calendar-aware.- Subtract datetimes and use
.dt.daysfor integer differences. .dt.to_period("M")truncates to month — essential for time-series grouping.pd.Timestamp.now()is today’s datetime.
What’s Next
You’ve mastered date manipulation. Next: string methods — cleaning, extracting, and transforming text data.