🔗 Pandas Concatenation – Combine DataFrames and Series Seamlessly
🧲 Introduction – Why Use Concatenation in Pandas?
Concatenation allows you to stack or merge multiple Series or DataFrames together, either vertically (row-wise) or horizontally (column-wise). It’s a key step in data integration, appending new records, or building large datasets from smaller ones.
🎯 In this guide, you’ll learn:
- How to concatenate Series and DataFrames
- Stack data row-wise (
axis=0
) or column-wise (axis=1
) - Manage mismatched indexes and duplicate columns
- Reset index and add keys for hierarchical indexing
📥 1. Concatenate Series Vertically
import pandas as pd
s1 = pd.Series([1, 2, 3], name='A')
s2 = pd.Series([4, 5, 6], name='A')
result = pd.concat([s1, s2])
print(result)
👉 Output:
0 1
1 2
2 3
0 4
1 5
2 6
Name: A, dtype: int64
✔️ Stacks the Series vertically (default axis=0
). Indexes are preserved and repeated unless reset.
🧱 2. Concatenate DataFrames Vertically (Row-wise)
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [88, 82]})
df_row = pd.concat([df1, df2])
print(df_row)
👉 Output:
Name Score
0 Alice 85
1 Bob 90
0 Charlie 88
1 David 82
✔️ Indexes are preserved. Use ignore_index=True
to reset:
pd.concat([df1, df2], ignore_index=True)
📊 3. Concatenate DataFrames Horizontally (Column-wise)
df_col = pd.concat([df1, df2], axis=1)
print(df_col)
👉 Output:
Name Score Name Score
0 Alice 85 Charlie 88
1 Bob 90 David 82
✔️ Aligns on index. Use this to merge features side-by-side.
🧩 4. Handle Mismatched Columns
df3 = pd.DataFrame({'Name': ['Eve'], 'Grade': ['A']})
df4 = pd.DataFrame({'Name': ['Frank'], 'Score': [95]})
df_mismatch = pd.concat([df3, df4], ignore_index=True)
print(df_mismatch)
👉 Output:
Name Grade Score
0 Eve A NaN
1 Frank NaN 95.0
✔️ Missing columns are filled with NaN
.
🔁 5. Add Hierarchical Keys to Concatenated Data
df_multi = pd.concat([df1, df2], keys=['Batch1', 'Batch2'])
print(df_multi)
👉 Output:
Name Score
Batch1 0 Alice 85
1 Bob 90
Batch2 0 Charlie 88
1 David 82
✔️ Adds multi-level index, useful when combining different sources.
🧾 6. Concatenate Along Custom Index
pd.concat([df1, df2], axis=0, keys=['first', 'second'])
✔️ Adds outer index for source identification.
🧮 7. Concatenate with Duplicate Indices
df_dup = pd.concat([df1, df1])
print(df_dup.index.duplicated())
✔️ Pandas allows duplicate indexes unless explicitly reset or deduplicated.
📌 Summary – Key Takeaways
Concatenation is essential for merging datasets quickly and flexibly. Whether you’re combining rows, columns, or multiple sources with differing shapes, Pandas provides clean and customizable methods using pd.concat()
.
🔍 Key Takeaways:
axis=0
→ row-wise (vertical),axis=1
→ column-wise (horizontal)- Use
ignore_index=True
to reset row indices - Add
keys=[]
for hierarchical labeling - Handles mismatched columns gracefully by filling
NaN
⚙️ Real-world relevance: Useful for log aggregation, feature engineering, chunked data assembly, survey merging, and multi-source integration.
❓ FAQs – Pandas Concatenation
❓ What’s the difference between concat()
and merge()
?
concat()
stacks data without join logicmerge()
performs SQL-like joins on keys
❓ How do I concatenate and reset row index?
pd.concat([df1, df2], ignore_index=True)
❓ Can I concatenate DataFrames with different columns?
✅ Yes—missing values will be filled with NaN
.
❓ Can I add source labels when concatenating?
Use:
pd.concat([df1, df2], keys=['source1', 'source2'])
❓ Is concatenation in-place?
❌ No—pd.concat()
returns a new DataFrame unless reassigned.
Share Now :