Reshaping Data
Progress Tracking
Log in to save this lesson and continue from where you left off.
Long vs Wide Format
Data comes in two shapes:
# LONG format (each observation is a row)
name subject score
Alice Math 90
Alice Science 85
Bob Math 78
Bob Science 92
# WIDE format (each subject is a column)
name Math Science
Alice 90 85
Bob 78 92Long format is better for analysis (filtering, grouping). Wide format is better for reports and display. You’ll constantly convert between them.
Creating Pivot Tables
pd.pivot_table(
orders,
index="cust_id",
values="total_order_cost",
aggfunc=["sum", "count"]
)Melting Wide to Long
pd.melt() is the reverse of pivoting. It takes columns and unpivots them into rows:
# Wide format: one column per metric
# employee has salary, bonus as separate columns
pd.melt(
employee,
id_vars=["first_name", "department"],
value_vars=["salary", "bonus"],
var_name="metric",
value_name="amount"
)The key parameters:
id_vars— columns to keep as-is (identifier columns)value_vars— columns to unpivot into rowsvar_name— name for the new column holding the old column namesvalue_name— name for the new column holding the values
When to Pivot vs Melt
- Data has one row per observation, you want a summary table →
pivot_table() - Data has metrics spread across columns, you want one row per metric →
melt() - Need to aggregate while reshaping →
pivot_table()(hasaggfunc) - Just rearranging, no aggregation →
melt()orpivot()(without_table)
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Create a pivot table showing average and max salary per department. Use pivot table() with aggfunc.
Titanic Survivors and Non-Survivors
| passengerid | survived | pclass | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22 | 1 | 0 | A/5 21171 | 7.25 | S | |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38 | 1 | 0 | PC 17599 | 71.28 | C85 | C |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.92 | S | |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35 | 1 | 0 | 113803 | 53.1 | C123 | S |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35 | 0 | 0 | 373450 | 8.05 | S |
Make a report showing the number of survivors and non-survivors by passenger class. Classes are categorized based on the `pclass` value as: • First class: `pclass = 1` • Second class: `pclass = 2` • Third class: `pclass = 3` Output the number of survivors and non-survivors by each class.
Key Takeaways
pd.pivot_table()converts long → wide with aggregation.pd.melt()converts wide → long (unpivoting).pd.crosstab()is a shortcut for frequency pivot tables.- Long format for analysis; wide format for reports.
index,columns,values,aggfuncare the four pivot_table parameters to know.
What’s Next
You’ve completed Module 4. You can now break complex analyses into clean steps, apply custom logic, and reshape data for any format. Module 5 covers dates, strings, and conditional logic — the functions that handle messy real-world data.