DateTime Extraction
Progress Tracking
Log in to save this lesson and continue from where you left off.
Working with Time
Dates and timestamps are everywhere: signup times, order dates, event logs. If you’re doing retention, cohorts, or time-series analysis, you need to slice and dice temporal data.
Converting to Datetime
This is the most common gotcha with dates in pandas. Your DataFrame loads from a CSV, the date column looks fine, but .dt.year throws an error. That’s because the column is actually stored as strings. pd.to_datetime() is always your first step when working with dates — make it a reflex.
Before extracting parts, make sure your column is a proper datetime type. Many datasets store dates as strings:
# Check the type
worker["joining_date"].dtype # Returns: object
# Convert string to datetime
worker["joining_date"] = pd.to_datetime(worker["joining_date"])If dtypes shows object for a date column, it’s stored as strings. Convert with pd.to_datetime() before any date operations. Without this, comparisons and extraction give wrong results.
Extracting Date Components
orders["order_date"] = pd.to_datetime(orders["order_date"])
orders["year"] = orders["order_date"].dt.year
orders["month"] = orders["order_date"].dt.month
orders[["order_date", "year", "month"]].head()| worker_id | first_name | last_name | salary | joining_date | department |
|---|---|---|---|---|---|
| 1 | Monika | Arora | 100000 | 2014-02-20 | HR |
| 2 | Niharika | Verma | 80000 | 2014-06-11 | Admin |
| 3 | Vishal | Singhal | 300000 | 2014-02-20 | HR |
| 4 | Amitah | Singh | 500000 | 2014-02-20 | Admin |
| 5 | Vivek | Bhati | 500000 | 2014-06-11 | Admin |
Convert joining date to datetime, then extract year, month, and day into separate columns.
Common .dt properties:
.dt.year— 4-digit year.dt.month— month 1–12.dt.day— day of month 1–31.dt.hour,.dt.minute,.dt.second— time components.dt.dayofweek— Monday=0, Sunday=6.dt.day_name()— full name like “Monday”.dt.quarter— quarter 1–4
Filtering by Date Parts
Once you can extract date parts, you can group by them — monthly trends, day-of-week patterns, quarterly summaries.
Extract the month, then group and count how many employees joined each month.
Day of Week Analysis
One important difference from SQL: pandas uses Monday=0, Sunday=6 (ISO standard), while SQL dialects vary. This makes pandas more predictable.
Finding patterns by day of week is a product analytics staple:
Count hires by day of week. Which day sees the most hires?
.dt.dayofweek returns 0–6 (Monday=0). .dt.day_name() returns "Monday", "Tuesday", etc. Use numbers for sorting, names for display.
April Admin Employees
| worker_id | first_name | last_name | salary | joining_date | department |
|---|---|---|---|---|---|
| 1 | Monika | Arora | 100000 | 2014-02-20 | HR |
| 2 | Niharika | Verma | 80000 | 2014-06-11 | Admin |
| 3 | Vishal | Singhal | 300000 | 2014-02-20 | HR |
| 4 | Amitah | Singh | 500000 | 2014-02-20 | Admin |
| 5 | Vivek | Bhati | 500000 | 2014-06-11 | Admin |
Find the number of employees working in the Admin department that joined in April or later, in any year.
Unique Users Per Client Per Month
| id | time_id | user_id | customer_id | client_id | event_type | event_id |
|---|---|---|---|---|---|---|
| 1 | 2020-02-28 | 3668-QPYBK | Sendit | desktop | message sent | 3 |
| 2 | 2020-02-28 | 7892-POOKP | Connectix | mobile | file received | 2 |
| 3 | 2020-04-03 | 9763-GRSKD | Zoomit | desktop | video call received | 7 |
| 4 | 2020-04-02 | 9763-GRSKD | Connectix | desktop | video call received | 7 |
| 5 | 2020-02-06 | 9237-HQITU | Sendit | desktop | video call received | 7 |
Write a query that returns the number of unique users per client for each month. Assume all events occur within the same year, so only month needs to be be in the output as a number from 1 to 12.
Number of Violations
| business_id | business_name | business_address | business_city | business_state | business_postal_code | business_latitude | business_longitude | business_location | business_phone_number | inspection_id | inspection_date | inspection_score | inspection_type | violation_id | violation_description | risk_category |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5800 | John Chin Elementary School | 350 Broadway St | San Francisco | CA | 94133 | 37.8 | -122.4 | {'longitude': '-122.403154', 'needs_recoding': False, 'latitude': '37.798358', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 5800_20171017 | 2017-10-17 | 98 | Routine - Unscheduled | 5800_20171017_103149 | Wiping cloths not clean or properly stored or inadequate sanitizer | Low Risk | |
| 64236 | Sutter Pub and Restaurant | 700 Sutter St | San Francisco | CA | 94102 | 37.79 | -122.41 | {'longitude': '-122.41188', 'needs_recoding': False, 'latitude': '37.78881', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 64236_20170725 | 2017-07-25 | 88 | Routine - Unscheduled | 64236_20170725_103133 | Foods not protected from contamination | Moderate Risk | |
| 1991 | SRI THAI CUISINE | 4621 LINCOLN Way | San Francisco | CA | 94122 | 37.76 | -122.51 | {'longitude': '-122.507779', 'needs_recoding': False, 'latitude': '37.764073', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 1991_20171129 | 2017-11-29 | 86 | Routine - Unscheduled | 1991_20171129_103139 | Improper food storage | Low Risk | |
| 3816 | Washington Bakery & Restaurant | 733 Washington St | San Francisco | CA | 94108 | 37.8 | -122.41 | {'longitude': '-122.405845', 'needs_recoding': False, 'latitude': '37.795174', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 3816_20160728 | 2016-07-28 | 67 | Routine - Unscheduled | 3816_20160728_103108 | Contaminated or adulterated food | High Risk | |
| 39119 | Brothers Restaurant | 4128 GEARY Blvd | San Francisco | CA | 94118 | 37.78 | -122.46 | {'longitude': '-122.463762', 'needs_recoding': False, 'latitude': '37.781148', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 39119_20160718 | 2016-07-18 | 79 | Routine - Unscheduled | 39119_20160718_103133 | Foods not protected from contamination | Moderate Risk |
You are given a dataset of health inspections that includes details about violations. Each row represents an inspection, and if an inspection resulted in a violation, the `violation_id` column will contain a value. Count the total number of violations that occurred at **'Roxanne Cafe'** for each year, based on the inspection date. Output the year and the corresponding number of violations in ascending order of the year.
Key Takeaways
pd.to_datetime()converts strings to datetime — always do this first..dt.year,.dt.month,.dt.day,.dt.dayofweekextract components.- Extract into a column, then filter or group by it.
.dt.dayofweek: Monday=0, Sunday=6.
What’s Next
Extracting parts is just the beginning. Next: date arithmetic — adding days, calculating differences, and truncating to periods for time-series analysis.