Pandas Concatenation – Combine DataFrames and Series Seamlessly
Introduction – Why Use Concatenation in Pandas?
Concatenation allows you to stack or merge multiple Series or DataFrames together, either vertically (row-wise) or horizontally (column-wise). It’s a key step in data integration, appending new records, or building large datasets from smaller ones.
In this guide, you’ll learn:
- How to concatenate Series and DataFrames
- Stack data row-wise (
axis=0) or column-wise (axis=1) - Manage mismatched indexes and duplicate columns
- Reset index and add keys for hierarchical indexing
1. Concatenate Series Vertically
import pandas as pd
s1 = pd.Series([1, 2, 3], name='A')
s2 = pd.Series([4, 5, 6], name='A')
result = pd.concat([s1, s2])
print(result)
Output:
0 1
1 2
2 3
0 4
1 5
2 6
Name: A, dtype: int64
✔️ Stacks the Series vertically (default axis=0). Indexes are preserved and repeated unless reset.
2. Concatenate DataFrames Vertically (Row-wise)
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [88, 82]})
df_row = pd.concat([df1, df2])
print(df_row)
Output:
Name Score
0 Alice 85
1 Bob 90
0 Charlie 88
1 David 82
✔️ Indexes are preserved. Use ignore_index=True to reset:
pd.concat([df1, df2], ignore_index=True)
3. Concatenate DataFrames Horizontally (Column-wise)
df_col = pd.concat([df1, df2], axis=1)
print(df_col)
Output:
Name Score Name Score
0 Alice 85 Charlie 88
1 Bob 90 David 82
✔️ Aligns on index. Use this to merge features side-by-side.
4. Handle Mismatched Columns
df3 = pd.DataFrame({'Name': ['Eve'], 'Grade': ['A']})
df4 = pd.DataFrame({'Name': ['Frank'], 'Score': [95]})
df_mismatch = pd.concat([df3, df4], ignore_index=True)
print(df_mismatch)
Output:
Name Grade Score
0 Eve A NaN
1 Frank NaN 95.0
✔️ Missing columns are filled with NaN.
5. Add Hierarchical Keys to Concatenated Data
df_multi = pd.concat([df1, df2], keys=['Batch1', 'Batch2'])
print(df_multi)
Output:
Name Score
Batch1 0 Alice 85
1 Bob 90
Batch2 0 Charlie 88
1 David 82
✔️ Adds multi-level index, useful when combining different sources.
6. Concatenate Along Custom Index
pd.concat([df1, df2], axis=0, keys=['first', 'second'])
✔️ Adds outer index for source identification.
7. Concatenate with Duplicate Indices
df_dup = pd.concat([df1, df1])
print(df_dup.index.duplicated())
✔️ Pandas allows duplicate indexes unless explicitly reset or deduplicated.
Summary – Key Takeaways
Concatenation is essential for merging datasets quickly and flexibly. Whether you’re combining rows, columns, or multiple sources with differing shapes, Pandas provides clean and customizable methods using pd.concat().
Key Takeaways:
axis=0→ row-wise (vertical),axis=1→ column-wise (horizontal)- Use
ignore_index=Trueto reset row indices - Add
keys=[]for hierarchical labeling - Handles mismatched columns gracefully by filling
NaN
Real-world relevance: Useful for log aggregation, feature engineering, chunked data assembly, survey merging, and multi-source integration.
FAQs – Pandas Concatenation
What’s the difference between concat() and merge()?
concat()stacks data without join logicmerge()performs SQL-like joins on keys
How do I concatenate and reset row index?
pd.concat([df1, df2], ignore_index=True)
Can I concatenate DataFrames with different columns?
Yes—missing values will be filled with NaN.
Can I add source labels when concatenating?
Use:
pd.concat([df1, df2], keys=['source1', 'source2'])
Is concatenation in-place?
No—pd.concat() returns a new DataFrame unless reassigned.
Share Now :
