4️⃣ 🧹 Pandas Data Cleaning & Preprocessing
Estimated reading: 3 minutes 288 views

🕳️ Pandas Handling Missing Data – Clean, Fill, or Drop NaNs Effectively


Introduction – Why Handle Missing Data?

Missing data (represented as NaN or None in Pandas) is common in real-world datasets due to user input errors, system issues, or incomplete records. Handling missing data is crucial for ensuring accurate analysis, clean visualizations, and reliable machine learning models.

In this guide, you’ll learn:

  • How to detect missing values
  • Drop rows/columns with missing data
  • Fill NaNs with constants, computed values, or interpolation
  • Customize strategies for each column

1. Detect Missing Values in a DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', None, 'David'],
    'Age': [25, np.nan, 30, 40],
    'Score': [85, 90, None, 95]
})

print(df.isnull())

✔️ .isnull() returns a DataFrame with True where data is missing and False otherwise.


2. Count Missing Values Per Column

print(df.isnull().sum())

✔️ Sums the number of NaN values in each column—helps prioritize columns to clean.


3. Drop Rows with Missing Data

df_clean = df.dropna()

✔️ Removes all rows where any cell is NaN.


Drop Rows with NaN in Specific Columns

df_clean = df.dropna(subset=['Name', 'Score'])

✔️ Only drops rows where Name or Score is missing—keeps the rest.


4. Drop Columns with Missing Data

df_drop_col = df.dropna(axis=1)

✔️ Removes entire columns that contain any missing values.


🩹 5. Fill Missing Values with Constants

df_fill = df.fillna(0)

✔️ Fills all NaNs with 0 or any static value of your choice.


6. Fill with Forward or Backward Values

df_ffill = df.fillna(method='ffill')  # Forward fill
df_bfill = df.fillna(method='bfill')  # Backward fill

✔️ Fills missing cells using the previous or next non-null value in the same column.


7. Fill with Computed Values (Mean, Median, etc.)

df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Score'].fillna(df['Score'].median(), inplace=True)

✔️ Replaces NaNs with statistical summaries—helps retain distribution shape.


8. Interpolate Missing Values

df['Score'] = df['Score'].interpolate()

✔️ Uses linear interpolation to estimate missing values between known points—useful for time series.


9. Replace NaNs with Different Values by Column

df.fillna({
    'Name': 'Unknown',
    'Age': df['Age'].mean(),
    'Score': 0
}, inplace=True)

✔️ Use a dictionary to apply different filling strategies per column.


Summary – Key Takeaways

Handling missing data is a foundational part of data preparation and cleaning. With Pandas, you can detect, drop, or fill NaNs efficiently using a variety of built-in tools.

Key Takeaways:

  • Use .isnull() and .sum() to detect and count NaNs
  • Use .dropna() to remove rows/columns with missing data
  • Use .fillna() or .interpolate() to fill gaps intelligently
  • Customize by column for targeted cleaning

Real-world relevance: Essential in survey data, financial reports, sensor readings, and user-generated content workflows.


FAQs – Handling Missing Data in Pandas

What’s the difference between NaN and None in Pandas?
Both are treated as missing values (NaN) under the hood using np.nan.


How do I only drop rows if all values are missing?

df.dropna(how='all')

Can I fill missing data with the mode (most frequent value)?

df['Column'].fillna(df['Column'].mode()[0])

Does fillna() change the original DataFrame?
No—unless you use inplace=True.


Is interpolation good for all datasets?
No. It works best for time-series or numerical data with logical progression.


Share Now :
Share

Pandas Handling Missing Data

Or Copy Link

CONTENTS
Scroll to Top