Module 1: DataFrame Fundamentals•30 min

Sorting and Limiting Results

Progress Tracking

"Show me the top 5" and "who was hired most recently" — these are among the most common requests in analytics. Sorting and limiting are the tools behind every leaderboard, every "top N" report, and every "most recent" query. They’re simple individually, but the real power is in chaining them: filter first, then sort, then take the top N.

Sorting with .sort_values()

Python

# Ascending (default) — lowest first
orders.sort_values("total_order_cost")

# Descending — highest first
orders.sort_values("total_order_cost", ascending=False)

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

Sorting and the StrataScratch Platform

The StrataScratch platform checks solutions by comparing rows, not their order. Even when a question says 'sort by salary descending,' your answer passes with any row order. Still, sort your output and scan it — it's the easiest way to spot mistakes in your logic.

Sorting Dates

Sorting by date works as long as the column is a proper datetime type — if dates sort oddly, convert with pd.to_datetime() first.

Tables: techcorp_workforce

Sorting by Multiple Columns

Python

orders.sort_values(
    ["cust_id", "total_order_cost"],
    ascending=[True, False]
)

Tables: techcorp_workforce

Each Column Gets Its Own Sort Direction

Pass a list to ascending matching the column order: ascending=[True, False] means first column A–Z, second column highest-first.

Filtering Then Sorting

Table: orders

id	cust_id	order_date	order_details	total_order_cost
1	3	2019-03-04	Coat	100
2	3	2019-03-01	Shoes	80
3	3	2019-03-07	Skirt	30
4	7	2019-02-01	Coat	25
5	7	2019-03-10	Shoes	80

Tables: orders

Sorting by Computed Values

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

Sort Without Creating a Column

For one-off sorting: df.sort_values(key=lambda c: c.str.len()). But creating a named column is usually clearer.

Limiting Results

Once your data is sorted the way you want, you usually don’t need all of it. .head(n) grabs the first N rows from the sorted result. This is the pandas equivalent of SQL’s LIMIT.

Python

# First 5 rows
df.head(5)

# Last 3 rows
df.tail(3)

# The full pattern: sort then limit
df.sort_values("salary", ascending=False).head(5)

Shortcuts: .nlargest() and .nsmallest()

Python

# These are equivalent:
df.sort_values("salary", ascending=False).head(5)
df.nlargest(5, "salary")

# And these:
df.sort_values("salary").head(3)
df.nsmallest(3, "salary")

Bottom-N with .nsmallest()

Tables: techcorp_workforce

When to Use .nlargest() vs. sort + head

.nlargest() is faster on large DataFrames (partial sort). Use sort_values().head() when you need a specific sort order or multiple columns.

Top N from Another Table

Table: orders

id	cust_id	order_date	order_details	total_order_cost
1	3	2019-03-04	Coat	100
2	3	2019-03-01	Shoes	80
3	3	2019-03-07	Skirt	30
4	7	2019-02-01	Coat	25
5	7	2019-03-10	Shoes	80

Tables: orders

Combining Filter and Top N

Tables: orders

The Full Pattern

In practice, you almost never sort alone. The real pattern is a chain: filter to the rows you care about, sort by the column that matters, limit to the top N, and select only the columns you need. This chain is the backbone of exploratory analysis.

Python

# Filter → Sort → Limit → Select columns
(
    df[df["department"] == "Engineering"]
    .sort_values("joining_date", ascending=False)
    .head(5)
    [["first_name", "last_name", "joining_date"]]
)

The Full Pipeline

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

Key Takeaways

.sort_values("col") sorts ascending; ascending=False for descending.
Pass lists for multi-column sorting with independent directions.
.head(n) limits to first N rows; .tail(n) for last N.
.nlargest(n, col) / .nsmallest(n, col) — faster shortcuts for top/bottom N.
The full pattern: filter → sort → limit → select columns.

What’s Next

You can now select columns, filter rows, combine conditions, handle missing data, sort, and limit. That’s the full toolkit for working with a single DataFrame. Module 2 introduces aggregation: counting, summing, averaging, and grouping — the foundation of every analytical report.

Next upContinue →

Introduction to Aggregate Methods

Module 2: Aggregation & Grouping

35 min