🧯 Pandas Filling Missing Values – Smartly Restore Incomplete Data
🧲 Introduction – Why Fill Missing Values?
Missing values (NaN
) can break aggregations, skew models, or crash code. Instead of dropping rows or columns, filling missing values helps preserve data integrity. Pandas offers multiple strategies—from static replacements to advanced interpolation—to intelligently fill gaps in your dataset.
🎯 In this guide, you’ll learn:
- How to fill missing values using constants, statistics, or surrounding values
- Use
fillna()
, forward/backward fill, and interpolation - Apply different fill strategies per column
- Best practices with examples and explanations
📥 1. Create a Sample DataFrame with Missing Values
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Alice', 'Bob', None, 'David'],
'Age': [25, np.nan, 30, 40],
'Score': [85, 90, None, 95]
})
print(df)
👉 Output:
Name Age Score
0 Alice 25.0 85.0
1 Bob NaN 90.0
2 None 30.0 NaN
3 David 40.0 95.0
🩹 2. Fill All Missing Values with a Constant
df_filled = df.fillna(0)
✔️ Replaces all NaN
with 0
. Good for numerical columns when 0
is a valid default.
🔁 3. Forward Fill – Use Previous Value
df_ffill = df.fillna(method='ffill')
✔️ Fills missing values with the last non-null value above it.
🔁 4. Backward Fill – Use Next Value
df_bfill = df.fillna(method='bfill')
✔️ Fills missing values with the next available non-null value.
🧠 5. Fill with Column Mean, Median, or Mode
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Score'].fillna(df['Score'].median(), inplace=True)
✔️ Replaces NaN
with statistical values that reflect the overall data distribution.
🧾 6. Fill with Most Frequent Value (Mode)
df['Name'].fillna(df['Name'].mode()[0], inplace=True)
✔️ Replaces missing names with the most common entry in the column.
🧮 7. Apply Different Fill Values by Column
df.fillna({
'Name': 'Unknown',
'Age': df['Age'].mean(),
'Score': 0
}, inplace=True)
✔️ Fills each column with a custom value or logic.
📉 8. Interpolate Missing Values (Linear by Default)
df['Score'] = df['Score'].interpolate()
✔️ Fills missing values using linear interpolation between known values—useful for time series or continuous data.
⏸️ 9. Limit the Number of Fills
df.fillna(method='ffill', limit=1)
✔️ Limits forward fill to only 1 consecutive NaN per column.
📌 Summary – Key Takeaways
Filling missing values is safer than dropping data—especially when missingness is random or limited. Choose the strategy based on context and data type.
🔍 Key Takeaways:
- Use
.fillna()
with constants or computed values - Use
.interpolate()
or ffill/bfill for smart contextual filling - Different columns may need different strategies
- Limit fill with
limit=
if overfilling could introduce noise
⚙️ Real-world relevance: Common in machine learning preprocessing, data entry repair, IoT data pipelines, and ETL workflows.
❓ FAQs – Filling Missing Values in Pandas
❓ What’s the best method to fill numeric NaNs?
Use .mean()
, .median()
, or .interpolate()
depending on data distribution.
❓ Can I fill text columns with a default value?
df['Name'].fillna('Unknown', inplace=True)
❓ What’s the difference between fillna()
and interpolate()
?
fillna()
uses static or previously known valuesinterpolate()
estimates values between known points
❓ How do I fill only a few NaNs per column?
Use the limit
parameter:
df.fillna(method='ffill', limit=1)
❓ Can I fill missing values in-place?
✅ Yes. Use:
df.fillna(value, inplace=True)
Share Now :