Combining DataFrames with concat
Progress Tracking
Log in to save this lesson and continue from where you left off.
Stacking Rows, Not Merging
Before we learn how to merge DataFrames side by side, there’s another way to combine data: pd.concat().
It doesn’t “link” rows together. Instead, it stacks DataFrames on top of each other. Think of it as “add more rows with the same columns.”
pd.concat()adds more rows (vertical)pd.merge()adds more columns (horizontal — next lessons)
When You’ll Use concat
pd.concat() comes up when you have data split across DataFrames that should really be together:
- Separate DataFrames for different regions (
us_orders,eu_orders) - Historical data in separate DataFrames (
orders_2022,orders_2023) - Different data sources with the same structure
Basic pd.concat()
combined = pd.concat([df_a, df_b])Pass a list of DataFrames. The result stacks them row by row.
combined = pd.concat([winemag_p1, winemag_p2], ignore_index=True)
combinedAdding a Source Column
A common pattern: add a label so you know where each row came from.
p1 = winemag_p1.copy()
p1["source"] = "dataset_1"
p2 = winemag_p2.copy()
p2["source"] = "dataset_2"
pd.concat([p1, p2], ignore_index=True)Dropping Duplicates
pd.concat() keeps all rows, including duplicates. To remove them:
combined = pd.concat([df_a, df_b]).drop_duplicates()Like SQL’s UNION ALL, keeping duplicates is usually correct. Each row represents real data. Only use .drop_duplicates() when you specifically need deduplication.
Removing Duplicates After Stacking
Stack both wine DataFrames. Check the row count, then drop duplicates and check again.
Resetting the Index
After pd.concat(), the index carries over from the original DataFrames. Reset it for clean output:
combined = pd.concat([df_a, df_b], ignore_index=True)pd.concat() aligns by column name. If one DataFrame has extra columns, those cells become NaN in the other. Both DataFrames should have matching columns for clean results.
Make the Friends Network Symmetric
| user_id | friend_id |
|---|---|
| 0 | 1 |
| 0 | 2 |
| 0 | 3 |
| 0 | 4 |
| 1 | 5 |
Make the friends network symmetric. For example, if 0 and 1 are friends, have the output contain both 0 and 1 under 1 and 0 respectively.
All Possible Wine Varieties
| id | country | description | designation | points | price | province | region_1 | region_2 | variety | winery |
|---|---|---|---|---|---|---|---|---|---|---|
| 126576 | US | Rich and round, this offers plenty of concentrated blackberry notes enveloped in warm spices and supple oak. There's a hint of green tomato leaves throughout, but the lush fruit combined with sturdy grape tannins and high acidity would pair well with fatty short ribs or braised pork. | Estate Club | 87 | 32 | Virginia | Virginia | Merlot | Veramar | |
| 127077 | Italy | This luminous sparkler offers measured aromas of yellow fruit and honey and delivers a broad, wide approach in the mouth. The wine's texture is creamy and full and there is a spicy point of effervescence on the palate. | Extra Dry | 85 | 19 | Veneto | Prosecco di Valdobbiadene | Prosecco | Varaschin | |
| 143029 | US | There are some pretty features to this wine, namely the lovely tannins and rich fruit, but it's just too soft and sweet. Tastes like blackberry jam, with extra white sugar spinkled on top. | Signature | 83 | 45 | California | Paso Robles | Central Coast | Cabernet Sauvignon | Byington |
| 98813 | US | Lovely nose, which runs from violets and cherries right into coffee and sweet chocolate. In the mouth the wine coats the palate with Napa-like intensity, only buttressed with more vivid acidity. It's both smooth and deep. The aromas are more detailed, interesting and really clean, loaded with dark flavors of soy, coffee, molasses and black cherry liqueur. | Champoux & Loess Vineyards Reserve | 93 | 100 | Washington | Washington | Washington Other | Cabernet Sauvignon | Boudreaux Cellars |
| 43172 | US | Tastes sweeter and fruitier than the 2009, which was dry, so if you like orange marmalade, pineapple tart and vanilla cookie sugariness in your Sauvignon Blancs, it's for you. | Les Pionniers | 86 | 40 | California | Sonoma County | Sonoma | Sauvignon Blanc | Capture |
| id | country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 118040 | US | A bit too soft and thus lacks structure. Shows a good array of wild berry and briary, brambly flavors. Dry and spicy, and ready now. | The Convict Rocky Ridge Vineyard | 86 | 38 | California | Rockpile | Sonoma | Paradise Ridge 2006 The Convict Rocky Ridge Vineyard Zinfandel (Rockpile) | Zinfandel | Paradise Ridge | ||
| 59743 | Italy | Aromas of toasted oak, vanilla and a confectionary note lead the nose while the palate offers mature Golden Delicious apple, butterscotch and walnut skin alongside bracing acidity. | Mongris Riserva | 88 | 30 | Northeastern Italy | Collio | Kerin O'Keefe | @kerinokeefe | Marco Felluga 2012 Mongris Riserva Pinot Grigio (Collio) | Pinot Grigio | Marco Felluga | |
| 117951 | US | Heavy in alcohol and overripe, but entirely dry, and the black currant, chocolate and pepper flavors will play well against richly sauced barbecue. | The Caboose | 84 | 30 | California | Alexander Valley | Sonoma | Starry Night 2007 The Caboose Zinfandel (Alexander Valley) | Zinfandel | Starry Night | ||
| 10202 | France | Fragrant; dusty plum and rose aromas are followed on the palate by very fruity flavors of cherries and apple skins. The fine bead and rich mousse result in a creamy mouthfeel. It's all a bit obvious and simple, but undeniably good. | Brut Rosa | 87 | 83 | Champagne | Champagne | Joe Czerwinski | @JoeCz | Ruinart NV Brut Rosa (Champagne) | Champagne Blend | Ruinart | |
| 69268 | Germany | Layers of plush yellow peach and apricot flavors highlight this easy, crowd-pleasing kabinett. Sunny lemon acidity lends refreshment to the midpalate, cutting through all the ripeness and leading to a brisk finish. Drink now through 2019. | Kabinett | 88 | 15 | Rheinhessen | Anna Lee C. Iijima | Weinreich 2012 Kabinett Riesling (Rheinhessen) | Riesling | Weinreich |
Find all possible varieties which occur in either of the winemag datasets. Output unique variety values only. Sort records based on the variety in ascending order.
Duplicate HR Department Employees
| worker_id | first_name | last_name | salary | joining_date | department |
|---|---|---|---|---|---|
| 1 | Monika | Arora | 100000 | 2014-02-20 | HR |
| 2 | Niharika | Verma | 80000 | 2014-06-11 | Admin |
| 3 | Vishal | Singhal | 300000 | 2014-02-20 | HR |
| 4 | Amitah | Singh | 500000 | 2014-02-20 | Admin |
| 5 | Vivek | Bhati | 500000 | 2014-06-11 | Admin |
Generate a list of employees who work in the HR department, including only their first names and department in the output. Each employee should appear twice in the list, meaning their first name and department should be duplicated in the output.
Key Takeaways
pd.concat([df_a, df_b])stacks DataFrames vertically.- Use
ignore_index=Truefor a clean sequential index. - Use
.drop_duplicates()only when deduplication is needed. - Add a source column before stacking to track origins.
What’s Next
Now you know how to stack similar DataFrames. But most of the time, you need to combine related data from different DataFrames — customers with their orders, products with their categories. That’s pd.merge(), starting with inner merge.