Module 2: Aggregation & Grouping35 min

Conditional Aggregation

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Sometimes you need to count or sum only the rows that meet a condition — “what percentage of users are active?” or “how much revenue came from each region?” In SQL, you’d use CASE WHEN inside an aggregate. In pandas, you create a boolean mask and aggregate it directly. The pattern is elegant once you see it: True counts as 1, False as 0.

When Basic Aggregates Aren’t Enough

So far, your aggregates operate on all rows. But what if you need to count only certain values? Sum only specific items? Get multiple conditional counts in one result?

In SQL, you’d use CASE WHEN inside SUM(). In pandas, you use boolean masks — which are simpler and more Pythonic.

The Pattern: Boolean Mask + .sum()

A boolean comparison returns True/False. Since Python treats True as 1 and False as 0, you can .sum() a boolean mask to count matches:

Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05
Python
# Count employees in each department
hr_count = (techcorp_workforce["department"] == "HR").sum()
admin_count = (techcorp_workforce["department"] == "Admin").sum()
total = len(techcorp_workforce)

pd.DataFrame({
    "total": [total],
    "hr_count": [hr_count],
    "admin_count": [admin_count]
})

One result with multiple conditional counts — no groupby needed.

Python
high_earners = (techcorp_workforce["salary"] > 100000).sum()
total = len(techcorp_workforce)
round(100.0 * high_earners / total, 2)
Use 100.0 for Percentages

In Python 3, integer division already returns a float (3 / 4 = 0.75). But multiplying by 100.0 makes the intent explicit and readable.

Computing a Ratio

1
Calculate a Percentage

Calculate what percentage of employees are in the HR department.

Tables: techcorp_workforce

Conditional Sums

You can sum values conditionally too, not just count:

2
Break Down Payroll by Department

Calculate total payroll, then HR payroll separately. Add Admin payroll.

Tables: techcorp_workforce

Pivot-Style Cross-Tabulations

For breakdowns by two dimensions, pd.crosstab() is the pandas equivalent of a pivot table:

Python
pd.crosstab(titanic["survived"], titanic["pclass"])

One line, and you’ve got survival counts broken down by passenger class.

Inspections by Risk Category

Table: sf_restaurant_health_violations
business_idbusiness_namebusiness_addressbusiness_citybusiness_statebusiness_postal_codebusiness_latitudebusiness_longitudebusiness_locationbusiness_phone_numberinspection_idinspection_dateinspection_scoreinspection_typeviolation_idviolation_descriptionrisk_category
5800John Chin Elementary School350 Broadway StSan FranciscoCA9413337.8-122.4{'longitude': '-122.403154', 'needs_recoding': False, 'latitude': '37.798358', 'human_address': '{"address":"","city":"","state":"","zip":""}'}5800_201710172017-10-1798Routine - Unscheduled5800_20171017_103149Wiping cloths not clean or properly stored or inadequate sanitizerLow Risk
64236Sutter Pub and Restaurant700 Sutter StSan FranciscoCA9410237.79-122.41{'longitude': '-122.41188', 'needs_recoding': False, 'latitude': '37.78881', 'human_address': '{"address":"","city":"","state":"","zip":""}'}64236_201707252017-07-2588Routine - Unscheduled64236_20170725_103133Foods not protected from contaminationModerate Risk
1991SRI THAI CUISINE4621 LINCOLN WaySan FranciscoCA9412237.76-122.51{'longitude': '-122.507779', 'needs_recoding': False, 'latitude': '37.764073', 'human_address': '{"address":"","city":"","state":"","zip":""}'}1991_201711292017-11-2986Routine - Unscheduled1991_20171129_103139Improper food storageLow Risk
3816Washington Bakery & Restaurant733 Washington StSan FranciscoCA9410837.8-122.41{'longitude': '-122.405845', 'needs_recoding': False, 'latitude': '37.795174', 'human_address': '{"address":"","city":"","state":"","zip":""}'}3816_201607282016-07-2867Routine - Unscheduled3816_20160728_103108Contaminated or adulterated foodHigh Risk
39119Brothers Restaurant4128 GEARY BlvdSan FranciscoCA9411837.78-122.46{'longitude': '-122.463762', 'needs_recoding': False, 'latitude': '37.781148', 'human_address': '{"address":"","city":"","state":"","zip":""}'}39119_201607182016-07-1879Routine - Unscheduled39119_20160718_103133Foods not protected from contaminationModerate Risk
3
Find the number of inspections for each risk category by inspection type
View solution

Find the number of inspections that resulted in each risk category per each inspection type. Consider the records with no risk category value belongs to a separate category. Output the result along with the corresponding inspection type and the corresponding total number of inspections per that type. The output should be pivoted, meaning that each risk category + total number should be a separate column. Order the result based on the number of inspections per inspection type in descending order.

Tables: sf_restaurant_health_violations

Share of Active Users

Table: fb_active_users
user_idnamestatuscountry
33Amanda LeonopenAustralia
27Jessica FarrellopenLuxembourg
18Wanda RamirezopenUSA
50Samuel MillerclosedBrazil
16Jacob YorkopenAustralia
4
Share of Active Users
View solution

Calculate the percentage of users who are both from the US and have an 'open' status, as indicated in the `fb_active_users` table.

Tables: fb_active_users

Processed Ticket Rate By Type

Table: facebook_complaints
complaint_idtypeprocessed
00TRUE
10TRUE
20FALSE
31TRUE
41TRUE
5
Processed Ticket Rate By Type
View solution

Find the processed rate of tickets for each `type`. The processed rate is defined as the number of processed tickets divided by the total number of tickets for that type. Round this result to two decimal places.

Tables: facebook_complaints

Key Takeaways

  • Boolean masks + .sum() count matching rows: (df["col"] > x).sum().
  • Use [condition]["col"].sum() for conditional sums.
  • pd.crosstab() creates pivot-style cross-tabulations in one line.
  • For percentages: conditional count / total count * 100.
  • These techniques replace SQL’s SUM(CASE WHEN ...) pattern.

What’s Next

You’ve now got all the building blocks. The final lesson pulls everything together with complex analyses that combine filtering, grouping, and conditional aggregation.