4️⃣ 🧹 Pandas Data Cleaning & Preprocessing

Estimated reading: 3 minutes 113 views

🧯 Pandas Filling Missing Values – Smartly Restore Incomplete Data

🧲 Introduction – Why Fill Missing Values?

Missing values (NaN) can break aggregations, skew models, or crash code. Instead of dropping rows or columns, filling missing values helps preserve data integrity. Pandas offers multiple strategies—from static replacements to advanced interpolation—to intelligently fill gaps in your dataset.

🎯 In this guide, you’ll learn:

How to fill missing values using constants, statistics, or surrounding values
Use fillna(), forward/backward fill, and interpolation
Apply different fill strategies per column
Best practices with examples and explanations

📥 1. Create a Sample DataFrame with Missing Values

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', None, 'David'],
    'Age': [25, np.nan, 30, 40],
    'Score': [85, 90, None, 95]
})

print(df)

👉 Output:

    Name   Age  Score
0  Alice  25.0   85.0
1    Bob   NaN   90.0
2   None  30.0    NaN
3  David  40.0   95.0

🩹 2. Fill All Missing Values with a Constant

df_filled = df.fillna(0)

✔️ Replaces all NaN with 0. Good for numerical columns when 0 is a valid default.

🔁 3. Forward Fill – Use Previous Value

df_ffill = df.fillna(method='ffill')

✔️ Fills missing values with the last non-null value above it.

🔁 4. Backward Fill – Use Next Value

df_bfill = df.fillna(method='bfill')

✔️ Fills missing values with the next available non-null value.

🧠 5. Fill with Column Mean, Median, or Mode

df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Score'].fillna(df['Score'].median(), inplace=True)

✔️ Replaces NaN with statistical values that reflect the overall data distribution.

🧾 6. Fill with Most Frequent Value (Mode)

df['Name'].fillna(df['Name'].mode()[0], inplace=True)

✔️ Replaces missing names with the most common entry in the column.

🧮 7. Apply Different Fill Values by Column

df.fillna({
    'Name': 'Unknown',
    'Age': df['Age'].mean(),
    'Score': 0
}, inplace=True)

✔️ Fills each column with a custom value or logic.

📉 8. Interpolate Missing Values (Linear by Default)

df['Score'] = df['Score'].interpolate()

✔️ Fills missing values using linear interpolation between known values—useful for time series or continuous data.

⏸️ 9. Limit the Number of Fills

df.fillna(method='ffill', limit=1)

✔️ Limits forward fill to only 1 consecutive NaN per column.

📌 Summary – Key Takeaways

Filling missing values is safer than dropping data—especially when missingness is random or limited. Choose the strategy based on context and data type.

🔍 Key Takeaways:

Use .fillna() with constants or computed values
Use .interpolate() or ffill/bfill for smart contextual filling
Different columns may need different strategies
Limit fill with limit= if overfilling could introduce noise

⚙️ Real-world relevance: Common in machine learning preprocessing, data entry repair, IoT data pipelines, and ETL workflows.

❓ FAQs – Filling Missing Values in Pandas

❓ What’s the best method to fill numeric NaNs?
Use .mean(), .median(), or .interpolate() depending on data distribution.

❓ Can I fill text columns with a default value?

df['Name'].fillna('Unknown', inplace=True)

❓ What’s the difference between fillna() and interpolate()?

fillna() uses static or previously known values
interpolate() estimates values between known points

❓ How do I fill only a few NaNs per column?
Use the limit parameter:

df.fillna(method='ffill', limit=1)

❓ Can I fill missing values in-place?
✅ Yes. Use:

df.fillna(value, inplace=True)

« Previous Next »

Share Now :