🕳️ Pandas Handling Missing Data – Clean, Fill, or Drop NaNs Effectively
Introduction – Why Handle Missing Data?
Missing data (represented as NaN or None in Pandas) is common in real-world datasets due to user input errors, system issues, or incomplete records. Handling missing data is crucial for ensuring accurate analysis, clean visualizations, and reliable machine learning models.
In this guide, you’ll learn:
- How to detect missing values
- Drop rows/columns with missing data
- Fill NaNs with constants, computed values, or interpolation
- Customize strategies for each column
1. Detect Missing Values in a DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Alice', 'Bob', None, 'David'],
'Age': [25, np.nan, 30, 40],
'Score': [85, 90, None, 95]
})
print(df.isnull())
✔️ .isnull() returns a DataFrame with True where data is missing and False otherwise.
2. Count Missing Values Per Column
print(df.isnull().sum())
✔️ Sums the number of NaN values in each column—helps prioritize columns to clean.
3. Drop Rows with Missing Data
df_clean = df.dropna()
✔️ Removes all rows where any cell is NaN.
Drop Rows with NaN in Specific Columns
df_clean = df.dropna(subset=['Name', 'Score'])
✔️ Only drops rows where Name or Score is missing—keeps the rest.
4. Drop Columns with Missing Data
df_drop_col = df.dropna(axis=1)
✔️ Removes entire columns that contain any missing values.
🩹 5. Fill Missing Values with Constants
df_fill = df.fillna(0)
✔️ Fills all NaNs with 0 or any static value of your choice.
6. Fill with Forward or Backward Values
df_ffill = df.fillna(method='ffill') # Forward fill
df_bfill = df.fillna(method='bfill') # Backward fill
✔️ Fills missing cells using the previous or next non-null value in the same column.
7. Fill with Computed Values (Mean, Median, etc.)
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Score'].fillna(df['Score'].median(), inplace=True)
✔️ Replaces NaNs with statistical summaries—helps retain distribution shape.
8. Interpolate Missing Values
df['Score'] = df['Score'].interpolate()
✔️ Uses linear interpolation to estimate missing values between known points—useful for time series.
9. Replace NaNs with Different Values by Column
df.fillna({
'Name': 'Unknown',
'Age': df['Age'].mean(),
'Score': 0
}, inplace=True)
✔️ Use a dictionary to apply different filling strategies per column.
Summary – Key Takeaways
Handling missing data is a foundational part of data preparation and cleaning. With Pandas, you can detect, drop, or fill NaNs efficiently using a variety of built-in tools.
Key Takeaways:
- Use
.isnull()and.sum()to detect and count NaNs - Use
.dropna()to remove rows/columns with missing data - Use
.fillna()or.interpolate()to fill gaps intelligently - Customize by column for targeted cleaning
Real-world relevance: Essential in survey data, financial reports, sensor readings, and user-generated content workflows.
FAQs – Handling Missing Data in Pandas
What’s the difference between NaN and None in Pandas?
Both are treated as missing values (NaN) under the hood using np.nan.
How do I only drop rows if all values are missing?
df.dropna(how='all')
Can I fill missing data with the mode (most frequent value)?
df['Column'].fillna(df['Column'].mode()[0])
Does fillna() change the original DataFrame?
No—unless you use inplace=True.
Is interpolation good for all datasets?
No. It works best for time-series or numerical data with logical progression.
Share Now :
