Module 5: Dates, Strings & Logic40 min

DateTime Extraction

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Working with Time

Dates and timestamps are everywhere: signup times, order dates, event logs. If you’re doing retention, cohorts, or time-series analysis, you need to slice and dice temporal data.

Converting to Datetime

This is the most common gotcha with dates in pandas. Your DataFrame loads from a CSV, the date column looks fine, but .dt.year throws an error. That’s because the column is actually stored as strings. pd.to_datetime() is always your first step when working with dates — make it a reflex.

Before extracting parts, make sure your column is a proper datetime type. Many datasets store dates as strings:

Python
# Check the type
worker["joining_date"].dtype  # Returns: object

# Convert string to datetime
worker["joining_date"] = pd.to_datetime(worker["joining_date"])
Always Check dtype First

If dtypes shows object for a date column, it’s stored as strings. Convert with pd.to_datetime() before any date operations. Without this, comparisons and extraction give wrong results.

Extracting Date Components

Python
orders["order_date"] = pd.to_datetime(orders["order_date"])
orders["year"] = orders["order_date"].dt.year
orders["month"] = orders["order_date"].dt.month
orders[["order_date", "year", "month"]].head()
Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20HR
2NiharikaVerma800002014-06-11Admin
3VishalSinghal3000002014-02-20HR
4AmitahSingh5000002014-02-20Admin
5VivekBhati5000002014-06-11Admin
1
Extract Year, Month, and Day

Convert joining date to datetime, then extract year, month, and day into separate columns.

Tables: worker

Common .dt properties:

  • .dt.year — 4-digit year
  • .dt.month — month 1–12
  • .dt.day — day of month 1–31
  • .dt.hour, .dt.minute, .dt.second — time components
  • .dt.dayofweek — Monday=0, Sunday=6
  • .dt.day_name() — full name like “Monday”
  • .dt.quarter — quarter 1–4

Filtering by Date Parts

Once you can extract date parts, you can group by them — monthly trends, day-of-week patterns, quarterly summaries.

2
Count Hires by Month

Extract the month, then group and count how many employees joined each month.

Tables: worker

Day of Week Analysis

One important difference from SQL: pandas uses Monday=0, Sunday=6 (ISO standard), while SQL dialects vary. This makes pandas more predictable.

Finding patterns by day of week is a product analytics staple:

3
Hires by Day of Week

Count hires by day of week. Which day sees the most hires?

Tables: worker
dayofweek vs day_name()

.dt.dayofweek returns 0–6 (Monday=0). .dt.day_name() returns "Monday", "Tuesday", etc. Use numbers for sorting, names for display.

April Admin Employees

Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20HR
2NiharikaVerma800002014-06-11Admin
3VishalSinghal3000002014-02-20HR
4AmitahSingh5000002014-02-20Admin
5VivekBhati5000002014-06-11Admin
4
April Admin Employees
View solution

Find the number of employees working in the Admin department that joined in April or later, in any year.

Tables: worker

Unique Users Per Client Per Month

Table: fact_events
idtime_iduser_idcustomer_idclient_idevent_typeevent_id
12020-02-283668-QPYBKSenditdesktopmessage sent3
22020-02-287892-POOKPConnectixmobilefile received2
32020-04-039763-GRSKDZoomitdesktopvideo call received7
42020-04-029763-GRSKDConnectixdesktopvideo call received7
52020-02-069237-HQITUSenditdesktopvideo call received7
5
Unique Users Per Client Per Month
View solution

Write a query that returns the number of unique users per client for each month. Assume all events occur within the same year, so only month needs to be be in the output as a number from 1 to 12.

Tables: fact_events

Number of Violations

Table: sf_restaurant_health_violations
business_idbusiness_namebusiness_addressbusiness_citybusiness_statebusiness_postal_codebusiness_latitudebusiness_longitudebusiness_locationbusiness_phone_numberinspection_idinspection_dateinspection_scoreinspection_typeviolation_idviolation_descriptionrisk_category
5800John Chin Elementary School350 Broadway StSan FranciscoCA9413337.8-122.4{'longitude': '-122.403154', 'needs_recoding': False, 'latitude': '37.798358', 'human_address': '{"address":"","city":"","state":"","zip":""}'}5800_201710172017-10-1798Routine - Unscheduled5800_20171017_103149Wiping cloths not clean or properly stored or inadequate sanitizerLow Risk
64236Sutter Pub and Restaurant700 Sutter StSan FranciscoCA9410237.79-122.41{'longitude': '-122.41188', 'needs_recoding': False, 'latitude': '37.78881', 'human_address': '{"address":"","city":"","state":"","zip":""}'}64236_201707252017-07-2588Routine - Unscheduled64236_20170725_103133Foods not protected from contaminationModerate Risk
1991SRI THAI CUISINE4621 LINCOLN WaySan FranciscoCA9412237.76-122.51{'longitude': '-122.507779', 'needs_recoding': False, 'latitude': '37.764073', 'human_address': '{"address":"","city":"","state":"","zip":""}'}1991_201711292017-11-2986Routine - Unscheduled1991_20171129_103139Improper food storageLow Risk
3816Washington Bakery & Restaurant733 Washington StSan FranciscoCA9410837.8-122.41{'longitude': '-122.405845', 'needs_recoding': False, 'latitude': '37.795174', 'human_address': '{"address":"","city":"","state":"","zip":""}'}3816_201607282016-07-2867Routine - Unscheduled3816_20160728_103108Contaminated or adulterated foodHigh Risk
39119Brothers Restaurant4128 GEARY BlvdSan FranciscoCA9411837.78-122.46{'longitude': '-122.463762', 'needs_recoding': False, 'latitude': '37.781148', 'human_address': '{"address":"","city":"","state":"","zip":""}'}39119_201607182016-07-1879Routine - Unscheduled39119_20160718_103133Foods not protected from contaminationModerate Risk
6
Number of violations
View solution

You are given a dataset of health inspections that includes details about violations. Each row represents an inspection, and if an inspection resulted in a violation, the `violation_id` column will contain a value. Count the total number of violations that occurred at **'Roxanne Cafe'** for each year, based on the inspection date. Output the year and the corresponding number of violations in ascending order of the year.

Tables: sf_restaurant_health_violations

Key Takeaways

  • pd.to_datetime() converts strings to datetime — always do this first.
  • .dt.year, .dt.month, .dt.day, .dt.dayofweek extract components.
  • Extract into a column, then filter or group by it.
  • .dt.dayofweek: Monday=0, Sunday=6.

What’s Next

Extracting parts is just the beginning. Next: date arithmetic — adding days, calculating differences, and truncating to periods for time-series analysis.