Module 3: Combining DataFrames35 min

Combining DataFrames with concat

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Stacking Rows, Not Merging

Before we learn how to merge DataFrames side by side, there’s another way to combine data: pd.concat().

It doesn’t “link” rows together. Instead, it stacks DataFrames on top of each other. Think of it as “add more rows with the same columns.”

  • pd.concat() adds more rows (vertical)
  • pd.merge() adds more columns (horizontal — next lessons)

When You’ll Use concat

pd.concat() comes up when you have data split across DataFrames that should really be together:

  • Separate DataFrames for different regions (us_orders, eu_orders)
  • Historical data in separate DataFrames (orders_2022, orders_2023)
  • Different data sources with the same structure

Basic pd.concat()

Python
combined = pd.concat([df_a, df_b])

Pass a list of DataFrames. The result stacks them row by row.

Python
combined = pd.concat([winemag_p1, winemag_p2], ignore_index=True)
combined

Adding a Source Column

A common pattern: add a label so you know where each row came from.

Python
p1 = winemag_p1.copy()
p1["source"] = "dataset_1"

p2 = winemag_p2.copy()
p2["source"] = "dataset_2"

pd.concat([p1, p2], ignore_index=True)

Dropping Duplicates

pd.concat() keeps all rows, including duplicates. To remove them:

Python
combined = pd.concat([df_a, df_b]).drop_duplicates()
Default to Keeping Duplicates

Like SQL’s UNION ALL, keeping duplicates is usually correct. Each row represents real data. Only use .drop_duplicates() when you specifically need deduplication.

Removing Duplicates After Stacking

1
Stack and Deduplicate

Stack both wine DataFrames. Check the row count, then drop duplicates and check again.

Tables: winemag_p1, winemag_p2

Resetting the Index

After pd.concat(), the index carries over from the original DataFrames. Reset it for clean output:

Python
combined = pd.concat([df_a, df_b], ignore_index=True)
Column Mismatch

pd.concat() aligns by column name. If one DataFrame has extra columns, those cells become NaN in the other. Both DataFrames should have matching columns for clean results.

Make the Friends Network Symmetric

Table: google_friends_network
user_idfriend_id
01
02
03
04
15
2
Make the friends network symmetric
View solution

Make the friends network symmetric. For example, if 0 and 1 are friends, have the output contain both 0 and 1 under 1 and 0 respectively.

Tables: google_friends_network

All Possible Wine Varieties

Table: winemag_p1
idcountrydescriptiondesignationpointspriceprovinceregion_1region_2varietywinery
126576USRich and round, this offers plenty of concentrated blackberry notes enveloped in warm spices and supple oak. There's a hint of green tomato leaves throughout, but the lush fruit combined with sturdy grape tannins and high acidity would pair well with fatty short ribs or braised pork.Estate Club8732VirginiaVirginiaMerlotVeramar
127077ItalyThis luminous sparkler offers measured aromas of yellow fruit and honey and delivers a broad, wide approach in the mouth. The wine's texture is creamy and full and there is a spicy point of effervescence on the palate.Extra Dry8519VenetoProsecco di ValdobbiadeneProseccoVaraschin
143029USThere are some pretty features to this wine, namely the lovely tannins and rich fruit, but it's just too soft and sweet. Tastes like blackberry jam, with extra white sugar spinkled on top.Signature8345CaliforniaPaso RoblesCentral CoastCabernet SauvignonByington
98813USLovely nose, which runs from violets and cherries right into coffee and sweet chocolate. In the mouth the wine coats the palate with Napa-like intensity, only buttressed with more vivid acidity. It's both smooth and deep. The aromas are more detailed, interesting and really clean, loaded with dark flavors of soy, coffee, molasses and black cherry liqueur.Champoux & Loess Vineyards Reserve93100WashingtonWashingtonWashington OtherCabernet SauvignonBoudreaux Cellars
43172USTastes sweeter and fruitier than the 2009, which was dry, so if you like orange marmalade, pineapple tart and vanilla cookie sugariness in your Sauvignon Blancs, it's for you.Les Pionniers8640CaliforniaSonoma CountySonomaSauvignon BlancCapture
Table: winemag_p2
idcountrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
118040USA bit too soft and thus lacks structure. Shows a good array of wild berry and briary, brambly flavors. Dry and spicy, and ready now.The Convict Rocky Ridge Vineyard8638CaliforniaRockpileSonomaParadise Ridge 2006 The Convict Rocky Ridge Vineyard Zinfandel (Rockpile)ZinfandelParadise Ridge
59743ItalyAromas of toasted oak, vanilla and a confectionary note lead the nose while the palate offers mature Golden Delicious apple, butterscotch and walnut skin alongside bracing acidity.Mongris Riserva8830Northeastern ItalyCollioKerin O'Keefe@kerinokeefeMarco Felluga 2012 Mongris Riserva Pinot Grigio (Collio)Pinot GrigioMarco Felluga
117951USHeavy in alcohol and overripe, but entirely dry, and the black currant, chocolate and pepper flavors will play well against richly sauced barbecue.The Caboose8430CaliforniaAlexander ValleySonomaStarry Night 2007 The Caboose Zinfandel (Alexander Valley)ZinfandelStarry Night
10202FranceFragrant; dusty plum and rose aromas are followed on the palate by very fruity flavors of cherries and apple skins. The fine bead and rich mousse result in a creamy mouthfeel. It's all a bit obvious and simple, but undeniably good.Brut Rosa8783ChampagneChampagneJoe Czerwinski@JoeCzRuinart NV Brut Rosa (Champagne)Champagne BlendRuinart
69268GermanyLayers of plush yellow peach and apricot flavors highlight this easy, crowd-pleasing kabinett. Sunny lemon acidity lends refreshment to the midpalate, cutting through all the ripeness and leading to a brisk finish. Drink now through 2019.Kabinett8815RheinhessenAnna Lee C. IijimaWeinreich 2012 Kabinett Riesling (Rheinhessen)RieslingWeinreich
3
Find all possible varieties which occur in either of the winemag datasets
View solution

Find all possible varieties which occur in either of the winemag datasets. Output unique variety values only. Sort records based on the variety in ascending order.

Tables: winemag_p1, winemag_p2

Duplicate HR Department Employees

Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20HR
2NiharikaVerma800002014-06-11Admin
3VishalSinghal3000002014-02-20HR
4AmitahSingh5000002014-02-20Admin
5VivekBhati5000002014-06-11Admin
4
Duplicate HR Department Employees
View solution

Generate a list of employees who work in the HR department, including only their first names and department in the output. Each employee should appear twice in the list, meaning their first name and department should be duplicated in the output.

Tables: worker

Key Takeaways

  • pd.concat([df_a, df_b]) stacks DataFrames vertically.
  • Use ignore_index=True for a clean sequential index.
  • Use .drop_duplicates() only when deduplication is needed.
  • Add a source column before stacking to track origins.

What’s Next

Now you know how to stack similar DataFrames. But most of the time, you need to combine related data from different DataFrames — customers with their orders, products with their categories. That’s pd.merge(), starting with inner merge.