5️⃣ 🔍 Pandas Data Manipulation & Transformation

Estimated reading: 3 minutes 290 views

Pandas Concatenation – Combine DataFrames and Series Seamlessly

Introduction – Why Use Concatenation in Pandas?

Concatenation allows you to stack or merge multiple Series or DataFrames together, either vertically (row-wise) or horizontally (column-wise). It’s a key step in data integration, appending new records, or building large datasets from smaller ones.

In this guide, you’ll learn:

How to concatenate Series and DataFrames
Stack data row-wise (axis=0) or column-wise (axis=1)
Manage mismatched indexes and duplicate columns
Reset index and add keys for hierarchical indexing

1. Concatenate Series Vertically

import pandas as pd

s1 = pd.Series([1, 2, 3], name='A')
s2 = pd.Series([4, 5, 6], name='A')

result = pd.concat([s1, s2])
print(result)

Output:

0    1
1    2
2    3
0    4
1    5
2    6
Name: A, dtype: int64

✔️ Stacks the Series vertically (default axis=0). Indexes are preserved and repeated unless reset.

2. Concatenate DataFrames Vertically (Row-wise)

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [88, 82]})

df_row = pd.concat([df1, df2])
print(df_row)

Output:

     Name  Score
0   Alice     85
1     Bob     90
0 Charlie     88
1   David     82

✔️ Indexes are preserved. Use ignore_index=True to reset:

pd.concat([df1, df2], ignore_index=True)

3. Concatenate DataFrames Horizontally (Column-wise)

df_col = pd.concat([df1, df2], axis=1)
print(df_col)

Output:

     Name  Score     Name  Score
0   Alice     85  Charlie     88
1     Bob     90    David     82

✔️ Aligns on index. Use this to merge features side-by-side.

4. Handle Mismatched Columns

df3 = pd.DataFrame({'Name': ['Eve'], 'Grade': ['A']})
df4 = pd.DataFrame({'Name': ['Frank'], 'Score': [95]})

df_mismatch = pd.concat([df3, df4], ignore_index=True)
print(df_mismatch)

Output:

    Name Grade  Score
0    Eve     A    NaN
1  Frank   NaN   95.0

✔️ Missing columns are filled with NaN.

5. Add Hierarchical Keys to Concatenated Data

df_multi = pd.concat([df1, df2], keys=['Batch1', 'Batch2'])
print(df_multi)

Output:

             Name  Score
Batch1 0   Alice     85
       1     Bob     90
Batch2 0 Charlie     88
       1   David     82

✔️ Adds multi-level index, useful when combining different sources.

6. Concatenate Along Custom Index

pd.concat([df1, df2], axis=0, keys=['first', 'second'])

✔️ Adds outer index for source identification.

7. Concatenate with Duplicate Indices

df_dup = pd.concat([df1, df1])
print(df_dup.index.duplicated())

✔️ Pandas allows duplicate indexes unless explicitly reset or deduplicated.

Summary – Key Takeaways

Concatenation is essential for merging datasets quickly and flexibly. Whether you’re combining rows, columns, or multiple sources with differing shapes, Pandas provides clean and customizable methods using pd.concat().

Key Takeaways:

axis=0 → row-wise (vertical), axis=1 → column-wise (horizontal)
Use ignore_index=True to reset row indices
Add keys=[] for hierarchical labeling
Handles mismatched columns gracefully by filling NaN

Real-world relevance: Useful for log aggregation, feature engineering, chunked data assembly, survey merging, and multi-source integration.

FAQs – Pandas Concatenation

What’s the difference between concat() and merge()?

concat() stacks data without join logic
merge() performs SQL-like joins on keys

How do I concatenate and reset row index?

pd.concat([df1, df2], ignore_index=True)

Can I concatenate DataFrames with different columns?
Yes—missing values will be filled with NaN.

Can I add source labels when concatenating?
Use:

pd.concat([df1, df2], keys=['source1', 'source2'])

Is concatenation in-place?
No—pd.concat() returns a new DataFrame unless reassigned.

« Previous Next »

Share Now :