4️⃣ 🧹 Pandas Data Cleaning & Preprocessing
Estimated reading: 3 minutes 29 views

🧯 Pandas Filling Missing Values – Smartly Restore Incomplete Data


🧲 Introduction – Why Fill Missing Values?

Missing values (NaN) can break aggregations, skew models, or crash code. Instead of dropping rows or columns, filling missing values helps preserve data integrity. Pandas offers multiple strategies—from static replacements to advanced interpolation—to intelligently fill gaps in your dataset.

🎯 In this guide, you’ll learn:

  • How to fill missing values using constants, statistics, or surrounding values
  • Use fillna(), forward/backward fill, and interpolation
  • Apply different fill strategies per column
  • Best practices with examples and explanations

📥 1. Create a Sample DataFrame with Missing Values

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', None, 'David'],
    'Age': [25, np.nan, 30, 40],
    'Score': [85, 90, None, 95]
})

print(df)

👉 Output:

    Name   Age  Score
0  Alice  25.0   85.0
1    Bob   NaN   90.0
2   None  30.0    NaN
3  David  40.0   95.0

🩹 2. Fill All Missing Values with a Constant

df_filled = df.fillna(0)

✔️ Replaces all NaN with 0. Good for numerical columns when 0 is a valid default.


🔁 3. Forward Fill – Use Previous Value

df_ffill = df.fillna(method='ffill')

✔️ Fills missing values with the last non-null value above it.


🔁 4. Backward Fill – Use Next Value

df_bfill = df.fillna(method='bfill')

✔️ Fills missing values with the next available non-null value.


🧠 5. Fill with Column Mean, Median, or Mode

df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Score'].fillna(df['Score'].median(), inplace=True)

✔️ Replaces NaN with statistical values that reflect the overall data distribution.


🧾 6. Fill with Most Frequent Value (Mode)

df['Name'].fillna(df['Name'].mode()[0], inplace=True)

✔️ Replaces missing names with the most common entry in the column.


🧮 7. Apply Different Fill Values by Column

df.fillna({
    'Name': 'Unknown',
    'Age': df['Age'].mean(),
    'Score': 0
}, inplace=True)

✔️ Fills each column with a custom value or logic.


📉 8. Interpolate Missing Values (Linear by Default)

df['Score'] = df['Score'].interpolate()

✔️ Fills missing values using linear interpolation between known values—useful for time series or continuous data.


⏸️ 9. Limit the Number of Fills

df.fillna(method='ffill', limit=1)

✔️ Limits forward fill to only 1 consecutive NaN per column.


📌 Summary – Key Takeaways

Filling missing values is safer than dropping data—especially when missingness is random or limited. Choose the strategy based on context and data type.

🔍 Key Takeaways:

  • Use .fillna() with constants or computed values
  • Use .interpolate() or ffill/bfill for smart contextual filling
  • Different columns may need different strategies
  • Limit fill with limit= if overfilling could introduce noise

⚙️ Real-world relevance: Common in machine learning preprocessing, data entry repair, IoT data pipelines, and ETL workflows.


❓ FAQs – Filling Missing Values in Pandas

❓ What’s the best method to fill numeric NaNs?
Use .mean(), .median(), or .interpolate() depending on data distribution.


❓ Can I fill text columns with a default value?

df['Name'].fillna('Unknown', inplace=True)

❓ What’s the difference between fillna() and interpolate()?

  • fillna() uses static or previously known values
  • interpolate() estimates values between known points

❓ How do I fill only a few NaNs per column?
Use the limit parameter:

df.fillna(method='ffill', limit=1)

❓ Can I fill missing values in-place?
✅ Yes. Use:

df.fillna(value, inplace=True)

Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

Pandas Filling Missing Values

Or Copy Link

CONTENTS
Scroll to Top