Introduction to Aggregate Methods
Progress Tracking
Log in to save this lesson and continue from where you left off.
Beyond Individual Rows
In Module 1, you learned to retrieve and filter individual rows. That’s useful, but it’s not how most business questions are phrased.
Nobody asks “Show me row 47 from the orders table.” They ask:
- How many customers do we have?
- What’s our total revenue this quarter?
- What’s the average order value?
These questions need you to crunch multiple rows down into a single answer. That’s what aggregate methods do.
The Methods You’ll Use Constantly
Pandas gives you aggregate methods directly on columns (Series) or entire DataFrames. Here are the five core ones.
.count() and len(): How Many?
len(df) gives you the total number of rows. .count() counts non-null values per column.
# Total rows
len(techcorp_workforce)
# Non-null values per column
techcorp_workforce.count()
# Non-null values in one column
techcorp_workforce["phone_number"].count()
# Unique values in a column
techcorp_workforce["department"].nunique()len(df) counts all rows including NaN. .count() counts non-null values. .nunique() counts unique non-null values. Getting these confused is one of the most common sources of incorrect numbers in reports.
.sum(): Add It Up
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Calculate the total payroll by summing all salaries.
.sum() ignores NaN values by default. If you have salaries of 50000, 60000, and NaN, you get 110000, not an error.
.mean(): The Average
techcorp_workforce["salary"].mean().mean() skips NaN values. If you have values 100, 200, and NaN, the mean is 150 (300 / 2), not 100 (300 / 3). This is usually what you want, but be aware of it.
They work on text too — .min() gives you the first alphabetically, .max() gives the last. And on dates: .min() is the earliest, .max() is the most recent.
Multiple Aggregates at Once with .agg()
Instead of calling each method separately, .agg() lets you run multiple aggregations in one call:
orders["total_order_cost"].agg(["sum", "mean", "min", "max"])Pass a list of method names as strings. The result is a Series with one value per aggregation.
Multiple Aggregations at Once
Use `.agg()` to calculate the sum, mean, min, and max salary in one expression.
Remember .describe() from Module 1? It’s essentially .agg() with a preset list of statistics: count, mean, std, min, 25%, 50%, 75%, max. Use .describe() for exploration, .agg() when you need specific aggregations.
Samantha's and Lisa's Total Sales Revenue
Use .sum() to add up values for specific conditions.
| salesperson | widget_sales | sales_revenue | id |
|---|---|---|---|
| Jim | 810 | 40500 | 1 |
| Bobby | 661 | 33050 | 2 |
| Samantha | 1006 | 50300 | 3 |
| Taylor | 984 | 49200 | 4 |
| Tom | 403 | 20150 | 5 |
What is the total sales revenue of Samantha and Lisa?
Olympics Events List By Age
Combine .min(), .mean(), and .max() to summarize a column.
| id | name | sex | age | height | weight | team | noc | games | year | season | city | sport | event | medal |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3520 | Guillermo J. Amparan | M | Mexico | MEX | 1924 Summer | 1924 | Summer | Paris | Athletics | Athletics Men's 800 metres | ||||
| 35394 | Henry John Finchett | M | Great Britain | GBR | 1924 Summer | 1924 | Summer | Paris | Gymnastics | Gymnastics Men's Rings | ||||
| 21918 | Georg Frederik Ahrensborg Clausen | M | 28 | Denmark | DEN | 1924 Summer | 1924 | Summer | Paris | Cycling | Cycling Men's Road Race Individual | |||
| 110345 | Marinus Cornelis Dick Sigmond | M | 26 | Netherlands | NED | 1924 Summer | 1924 | Summer | Paris | Football | Football Men's Football | |||
| 54193 | Thodore Tho Jeitz | M | 26 | Luxembourg | LUX | 1924 Summer | 1924 | Summer | Paris | Gymnastics | Gymnastics Men's Individual All-Around |
Find the lowest, average, and the highest ages of athletes across all Olympics. HINT: If athlete participated in more than one discipline at one Olympic games, consider it as a separate athlete, no need to remove such edge cases.
Hour Of Highest Gas Expense
| index | weather | hour | travel_distance | gasoline_cost |
|---|---|---|---|---|
| 0 | cloudy | 7 | 24.47 | 1.13 |
| 1 | cloudy | 23 | 23.67 | 1.99 |
| 2 | sunny | 17 | 20.93 | 0.86 |
| 3 | rainy | 2 | 29.58 | 0.85 |
| 4 | rainy | 7 | 16.11 | 0.95 |
Find the hour with the highest gasoline cost. Assume there's only 1 hour with the highest gas cost.
Key Takeaways
.sum(),.mean(),.min(),.max(),.count()are your core aggregate methods.- All of them skip
NaNvalues by default. len(df)counts all rows;.count()counts non-null;.nunique()counts unique..agg([...])runs multiple aggregations in one call.- Ask yourself: am I counting rows or unique entities?
What’s Next
Right now, you’re getting one number for the entire DataFrame. But what if you need revenue by region? Headcount by department? That’s where .groupby() comes in.