Module 4: Multi-Step Analysis•30 min

Reshaping Data

Progress Tracking

Long vs Wide Format

Data comes in two shapes:

Python

# LONG format (each observation is a row)
  name     subject  score
  Alice    Math     90
  Alice    Science  85
  Bob      Math     78
  Bob      Science  92

# WIDE format (each subject is a column)
  name     Math  Science
  Alice    90    85
  Bob      78    92

Long format is better for analysis (filtering, grouping). Wide format is better for reports and display. You’ll constantly convert between them.

Creating Pivot Tables

Python

pd.pivot_table(
    orders,
    index="cust_id",
    values="total_order_cost",
    aggfunc=["sum", "count"]
)

Melting Wide to Long

pd.melt() is the reverse of pivoting. It takes columns and unpivots them into rows:

Python

# Wide format: one column per metric
# employee has salary, bonus as separate columns
pd.melt(
    employee,
    id_vars=["first_name", "department"],
    value_vars=["salary", "bonus"],
    var_name="metric",
    value_name="amount"
)

The key parameters:

id_vars — columns to keep as-is (identifier columns)
value_vars — columns to unpivot into rows
var_name — name for the new column holding the old column names
value_name — name for the new column holding the values

When to Pivot vs Melt

Data has one row per observation, you want a summary table → pivot_table()
Data has metrics spread across columns, you want one row per metric → melt()
Need to aggregate while reshaping → pivot_table() (has aggfunc)
Just rearranging, no aggregation → melt() or pivot() (without _table)

Table: employee

id	first_name	last_name	age	sex	employee_title	department	salary	target	bonus	email	city	address	manager_id
5	Max	George	26	M	Sales	Sales	1300	200	150	Max@company.com	California	2638 Richards Avenue	1
13	Katty	Bond	56	F	Manager	Management	150000	0	300	Katty@company.com	Arizona		1
11	Richerd	Gear	57	M	Manager	Management	250000	0	300	Richerd@company.com	Alabama		1
10	Jennifer	Dion	34	F	Sales	Sales	1000	200	150	Jennifer@company.com	Alabama		13
19	George	Joe	50	M	Manager	Management	100000	0	300	George@company.com	Florida	1003 Wyatt Street	1

Tables: employee

Titanic Survivors and Non-Survivors

Table: titanic

passengerid	survived	pclass	name	sex	age	sibsp	ticket	fare	cabin	embarked
1	0	3	Braund, Mr. Owen Harris	male	22	1	A/5 21171	7.25		S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38	1	PC 17599	71.28	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26	0	STON/O2. 3101282	7.92		S
4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35	1	113803	53.1	C123	S
5	0	3	Allen, Mr. William Henry	male	35	0	373450	8.05		S

Tables: titanic

Key Takeaways

pd.pivot_table() converts long → wide with aggregation.
pd.melt() converts wide → long (unpivoting).
pd.crosstab() is a shortcut for frequency pivot tables.
Long format for analysis; wide format for reports.
index, columns, values, aggfunc are the four pivot_table parameters to know.

What’s Next

You’ve completed Module 4. You can now break complex analyses into clean steps, apply custom logic, and reshape data for any format. Module 5 covers dates, strings, and conditional logic — the functions that handle messy real-world data.

Next upContinue →

DateTime Extraction

Module 5: Dates, Strings & Logic

40 min