5️⃣ 🔍 Pandas Data Manipulation & Transformation
Estimated reading: 3 minutes 290 views

Pandas Concatenation – Combine DataFrames and Series Seamlessly


Introduction – Why Use Concatenation in Pandas?

Concatenation allows you to stack or merge multiple Series or DataFrames together, either vertically (row-wise) or horizontally (column-wise). It’s a key step in data integration, appending new records, or building large datasets from smaller ones.

In this guide, you’ll learn:

  • How to concatenate Series and DataFrames
  • Stack data row-wise (axis=0) or column-wise (axis=1)
  • Manage mismatched indexes and duplicate columns
  • Reset index and add keys for hierarchical indexing

1. Concatenate Series Vertically

import pandas as pd

s1 = pd.Series([1, 2, 3], name='A')
s2 = pd.Series([4, 5, 6], name='A')

result = pd.concat([s1, s2])
print(result)

Output:

0    1
1    2
2    3
0    4
1    5
2    6
Name: A, dtype: int64

✔️ Stacks the Series vertically (default axis=0). Indexes are preserved and repeated unless reset.


2. Concatenate DataFrames Vertically (Row-wise)

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [88, 82]})

df_row = pd.concat([df1, df2])
print(df_row)

Output:

     Name  Score
0   Alice     85
1     Bob     90
0 Charlie     88
1   David     82

✔️ Indexes are preserved. Use ignore_index=True to reset:

pd.concat([df1, df2], ignore_index=True)

3. Concatenate DataFrames Horizontally (Column-wise)

df_col = pd.concat([df1, df2], axis=1)
print(df_col)

Output:

     Name  Score     Name  Score
0   Alice     85  Charlie     88
1     Bob     90    David     82

✔️ Aligns on index. Use this to merge features side-by-side.


4. Handle Mismatched Columns

df3 = pd.DataFrame({'Name': ['Eve'], 'Grade': ['A']})
df4 = pd.DataFrame({'Name': ['Frank'], 'Score': [95]})

df_mismatch = pd.concat([df3, df4], ignore_index=True)
print(df_mismatch)

Output:

    Name Grade  Score
0    Eve     A    NaN
1  Frank   NaN   95.0

✔️ Missing columns are filled with NaN.


5. Add Hierarchical Keys to Concatenated Data

df_multi = pd.concat([df1, df2], keys=['Batch1', 'Batch2'])
print(df_multi)

Output:

             Name  Score
Batch1 0   Alice     85
       1     Bob     90
Batch2 0 Charlie     88
       1   David     82

✔️ Adds multi-level index, useful when combining different sources.


6. Concatenate Along Custom Index

pd.concat([df1, df2], axis=0, keys=['first', 'second'])

✔️ Adds outer index for source identification.


7. Concatenate with Duplicate Indices

df_dup = pd.concat([df1, df1])
print(df_dup.index.duplicated())

✔️ Pandas allows duplicate indexes unless explicitly reset or deduplicated.


Summary – Key Takeaways

Concatenation is essential for merging datasets quickly and flexibly. Whether you’re combining rows, columns, or multiple sources with differing shapes, Pandas provides clean and customizable methods using pd.concat().

Key Takeaways:

  • axis=0 → row-wise (vertical), axis=1 → column-wise (horizontal)
  • Use ignore_index=True to reset row indices
  • Add keys=[] for hierarchical labeling
  • Handles mismatched columns gracefully by filling NaN

Real-world relevance: Useful for log aggregation, feature engineering, chunked data assembly, survey merging, and multi-source integration.


FAQs – Pandas Concatenation

What’s the difference between concat() and merge()?

  • concat() stacks data without join logic
  • merge() performs SQL-like joins on keys

How do I concatenate and reset row index?

pd.concat([df1, df2], ignore_index=True)

Can I concatenate DataFrames with different columns?
Yes—missing values will be filled with NaN.


Can I add source labels when concatenating?
Use:

pd.concat([df1, df2], keys=['source1', 'source2'])

Is concatenation in-place?
No—pd.concat() returns a new DataFrame unless reassigned.


Share Now :
Share

Pandas Concatenation

Or Copy Link

CONTENTS
Scroll to Top