🕳️ Pandas Handling Missing Data – Clean, Fill, or Drop NaNs Effectively
🧲 Introduction – Why Handle Missing Data?
Missing data (represented as NaN
or None
in Pandas) is common in real-world datasets due to user input errors, system issues, or incomplete records. Handling missing data is crucial for ensuring accurate analysis, clean visualizations, and reliable machine learning models.
🎯 In this guide, you’ll learn:
- How to detect missing values
- Drop rows/columns with missing data
- Fill NaNs with constants, computed values, or interpolation
- Customize strategies for each column
🔍 1. Detect Missing Values in a DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Alice', 'Bob', None, 'David'],
'Age': [25, np.nan, 30, 40],
'Score': [85, 90, None, 95]
})
print(df.isnull())
✔️ .isnull()
returns a DataFrame with True
where data is missing and False
otherwise.
📊 2. Count Missing Values Per Column
print(df.isnull().sum())
✔️ Sums the number of NaN
values in each column—helps prioritize columns to clean.
🧹 3. Drop Rows with Missing Data
df_clean = df.dropna()
✔️ Removes all rows where any cell is NaN.
✅ Drop Rows with NaN in Specific Columns
df_clean = df.dropna(subset=['Name', 'Score'])
✔️ Only drops rows where Name
or Score
is missing—keeps the rest.
🧾 4. Drop Columns with Missing Data
df_drop_col = df.dropna(axis=1)
✔️ Removes entire columns that contain any missing values.
🩹 5. Fill Missing Values with Constants
df_fill = df.fillna(0)
✔️ Fills all NaNs with 0
or any static value of your choice.
🔁 6. Fill with Forward or Backward Values
df_ffill = df.fillna(method='ffill') # Forward fill
df_bfill = df.fillna(method='bfill') # Backward fill
✔️ Fills missing cells using the previous or next non-null value in the same column.
🧠 7. Fill with Computed Values (Mean, Median, etc.)
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Score'].fillna(df['Score'].median(), inplace=True)
✔️ Replaces NaNs with statistical summaries—helps retain distribution shape.
🔬 8. Interpolate Missing Values
df['Score'] = df['Score'].interpolate()
✔️ Uses linear interpolation to estimate missing values between known points—useful for time series.
🧯 9. Replace NaNs with Different Values by Column
df.fillna({
'Name': 'Unknown',
'Age': df['Age'].mean(),
'Score': 0
}, inplace=True)
✔️ Use a dictionary to apply different filling strategies per column.
📌 Summary – Key Takeaways
Handling missing data is a foundational part of data preparation and cleaning. With Pandas, you can detect, drop, or fill NaNs efficiently using a variety of built-in tools.
🔍 Key Takeaways:
- Use
.isnull()
and.sum()
to detect and count NaNs - Use
.dropna()
to remove rows/columns with missing data - Use
.fillna()
or.interpolate()
to fill gaps intelligently - Customize by column for targeted cleaning
⚙️ Real-world relevance: Essential in survey data, financial reports, sensor readings, and user-generated content workflows.
❓ FAQs – Handling Missing Data in Pandas
❓ What’s the difference between NaN
and None
in Pandas?
✅ Both are treated as missing values (NaN
) under the hood using np.nan
.
❓ How do I only drop rows if all values are missing?
df.dropna(how='all')
❓ Can I fill missing data with the mode (most frequent value)?
df['Column'].fillna(df['Column'].mode()[0])
❓ Does fillna()
change the original DataFrame?
❌ No—unless you use inplace=True
.
❓ Is interpolation good for all datasets?
⚠️ No. It works best for time-series or numerical data with logical progression.
Share Now :