Module 5: Dates, Strings & Logic25 min

Date Arithmetic and Truncation

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Adding Time to Dates

Subtracting two dates is how you calculate durations — days since purchase, time between events, ticket age. Adding time is how you compute deadlines and future dates. pandas uses pd.Timedelta for exact durations and pd.DateOffset for calendar-aware shifts.

Python
worker["joining_date"] = pd.to_datetime(worker["joining_date"])
worker["probation_end"] = worker["joining_date"] + pd.Timedelta(days=90)
worker["anniversary"] = worker["joining_date"] + pd.DateOffset(years=1)
worker[["first_name", "joining_date", "probation_end", "anniversary"]]
Timedelta vs DateOffset

pd.Timedelta(days=7) for fixed durations (days, hours, seconds). pd.DateOffset(months=1) for calendar-aware durations (months, years) that handle varying month lengths.

Calculating Date Differences

Subtracting two dates is how you calculate durations — days since purchase, time between events, ticket age.

Subtracting two datetime columns gives a Timedelta. Use .dt.days to get an integer:

1
Calculate Days Employed

Calculate how many days each employee has worked. The starter converts the column — add the subtraction.

Tables: worker

Truncating to Periods

Time-series analysis almost always starts with truncation: group daily data into weekly or monthly buckets. Without truncation, you’d have one row per exact timestamp, which is useless for trend analysis. .dt.to_period() is how you go from "March 15, 2024" to just "March 2024" — turning individual events into meaningful time buckets.

For time-series grouping, truncate dates to the start of a period with .dt.to_period() or direct formatting:

Python
# Truncate to month (returns Period)
orders["month"] = orders["order_date"].dt.to_period("M")

# Or truncate to first of month (returns Timestamp)
orders["month_start"] = (
    orders["order_date"].dt.to_period("M").dt.to_timestamp()
)

Grouping by Period

2
Monthly Hire Trends

Group employees by the month they joined and count hires per month.

Tables: worker
Period vs Timestamp

.dt.to_period("M") returns a Period object (like "2024-03"). Some operations need Timestamps instead — convert back with .dt.to_timestamp(). For groupby, Period works fine.

Number of Shipments Per Month

Table: amazon_shipment
shipment_idsub_idweightshipment_date
1011102021-08-30
1012202021-09-01
1013102021-09-05
1021502021-09-02
1031252021-09-01
3
Number of Shipments Per Month
View solution

Write a query that will calculate the number of shipments per month. The unique key for one shipment is a combination of `shipment_id` and `sub_id`. Output the `year_month` in format `YYYY-MM` and the number of shipments in that month.

Tables: amazon_shipment

Users By Average Session Time

Table: facebook_web_log
user_idtimestampaction
02019-04-25 13:30:15page_load
02019-04-25 13:30:18page_load
02019-04-25 13:30:40scroll_down
02019-04-25 13:30:45scroll_up
02019-04-25 13:31:10scroll_down
4
Users By Average Session Time
View solution

Calculate each user's average session time, where a session is defined as the time difference between a `page_load` and a `page_exit`. Assume each user has only one session per day. If there are multiple `page_load` or `page_exit` events on the same day, use only the latest `page_load` and the earliest `page_exit`. Only consider sessions where the `page_load` occurs before the `page_exit` on the same day. Output the `user_id` and their average session time.

Tables: facebook_web_log

Key Takeaways

  • pd.Timedelta(days=n) for fixed durations; pd.DateOffset(months=n) for calendar-aware.
  • Subtract datetimes and use .dt.days for integer differences.
  • .dt.to_period("M") truncates to month — essential for time-series grouping.
  • pd.Timestamp.now() is today’s datetime.

What’s Next

You’ve mastered date manipulation. Next: string methods — cleaning, extracting, and transforming text data.