Sorting and Limiting Results
Progress Tracking
Log in to save this lesson and continue from where you left off.
"Show me the top 5" and "who was hired most recently" — these are among the most common requests in analytics. Sorting and limiting are the tools behind every leaderboard, every "top N" report, and every "most recent" query. They’re simple individually, but the real power is in chaining them: filter first, then sort, then take the top N.
Sorting with .sort_values()
# Ascending (default) — lowest first
orders.sort_values("total_order_cost")
# Descending — highest first
orders.sort_values("total_order_cost", ascending=False)| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Sort `techcorp_workforce` by `salary` from lowest to highest.
The StrataScratch platform checks solutions by comparing rows, not their order. Even when a question says 'sort by salary descending,' your answer passes with any row order. Still, sort your output and scan it — it's the easiest way to spot mistakes in your logic.
Sorting Dates
Sorting by date works as long as the column is a proper datetime type — if dates sort oddly, convert with pd.to_datetime() first.
Sort employees to show the most recently hired first.
Sorting by Multiple Columns
orders.sort_values(
["cust_id", "total_order_cost"],
ascending=[True, False]
)Sort `techcorp_workforce` by `department` ascending and `salary` descending.
Pass a list to ascending matching the column order: ascending=[True, False] means first column A–Z, second column highest-first.
Filtering Then Sorting
| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
Find all orders with a total cost over 50 and sort them from most to least expensive.
Sorting by Computed Values
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Create a column with the length of each employee’s first name, then sort by it (longest first).
For one-off sorting: df.sort_values(key=lambda c: c.str.len()). But creating a named column is usually clearer.
Limiting Results
Once your data is sorted the way you want, you usually don’t need all of it. .head(n) grabs the first N rows from the sorted result. This is the pandas equivalent of SQL’s LIMIT.
# First 5 rows
df.head(5)
# Last 3 rows
df.tail(3)
# The full pattern: sort then limit
df.sort_values("salary", ascending=False).head(5)Shortcuts: .nlargest() and .nsmallest()
# These are equivalent:
df.sort_values("salary", ascending=False).head(5)
df.nlargest(5, "salary")
# And these:
df.sort_values("salary").head(3)
df.nsmallest(3, "salary")Bottom-N with .nsmallest()
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Find the 3 lowest-paid employees.
.nlargest() is faster on large DataFrames (partial sort). Use sort_values().head() when you need a specific sort order or multiple columns.
Top N from Another Table
| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
Find the 3 most expensive orders.
Combining Filter and Top N
Find the 3 highest-value orders placed by customer 15.
The Full Pattern
In practice, you almost never sort alone. The real pattern is a chain: filter to the rows you care about, sort by the column that matters, limit to the top N, and select only the columns you need. This chain is the backbone of exploratory analysis.
# Filter → Sort → Limit → Select columns
(
df[df["department"] == "Engineering"]
.sort_values("joining_date", ascending=False)
.head(5)
[["first_name", "last_name", "joining_date"]]
)The Full Pipeline
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Find the 5 most recently hired employees in the Engineering department. Show their first name, last name, and joining date.
Key Takeaways
.sort_values("col")sorts ascending;ascending=Falsefor descending.- Pass lists for multi-column sorting with independent directions.
.head(n)limits to first N rows;.tail(n)for last N..nlargest(n, col)/.nsmallest(n, col)— faster shortcuts for top/bottom N.- The full pattern: filter → sort → limit → select columns.
What’s Next
You can now select columns, filter rows, combine conditions, handle missing data, sort, and limit. That’s the full toolkit for working with a single DataFrame. Module 2 introduces aggregation: counting, summing, averaging, and grouping — the foundation of every analytical report.