4️⃣ 🧹 Pandas Data Cleaning & Preprocessing
Estimated reading: 3 minutes 33 views

🕳️ Pandas Handling Missing Data – Clean, Fill, or Drop NaNs Effectively


🧲 Introduction – Why Handle Missing Data?

Missing data (represented as NaN or None in Pandas) is common in real-world datasets due to user input errors, system issues, or incomplete records. Handling missing data is crucial for ensuring accurate analysis, clean visualizations, and reliable machine learning models.

🎯 In this guide, you’ll learn:

  • How to detect missing values
  • Drop rows/columns with missing data
  • Fill NaNs with constants, computed values, or interpolation
  • Customize strategies for each column

🔍 1. Detect Missing Values in a DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', None, 'David'],
    'Age': [25, np.nan, 30, 40],
    'Score': [85, 90, None, 95]
})

print(df.isnull())

✔️ .isnull() returns a DataFrame with True where data is missing and False otherwise.


📊 2. Count Missing Values Per Column

print(df.isnull().sum())

✔️ Sums the number of NaN values in each column—helps prioritize columns to clean.


🧹 3. Drop Rows with Missing Data

df_clean = df.dropna()

✔️ Removes all rows where any cell is NaN.


✅ Drop Rows with NaN in Specific Columns

df_clean = df.dropna(subset=['Name', 'Score'])

✔️ Only drops rows where Name or Score is missing—keeps the rest.


🧾 4. Drop Columns with Missing Data

df_drop_col = df.dropna(axis=1)

✔️ Removes entire columns that contain any missing values.


🩹 5. Fill Missing Values with Constants

df_fill = df.fillna(0)

✔️ Fills all NaNs with 0 or any static value of your choice.


🔁 6. Fill with Forward or Backward Values

df_ffill = df.fillna(method='ffill')  # Forward fill
df_bfill = df.fillna(method='bfill')  # Backward fill

✔️ Fills missing cells using the previous or next non-null value in the same column.


🧠 7. Fill with Computed Values (Mean, Median, etc.)

df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Score'].fillna(df['Score'].median(), inplace=True)

✔️ Replaces NaNs with statistical summaries—helps retain distribution shape.


🔬 8. Interpolate Missing Values

df['Score'] = df['Score'].interpolate()

✔️ Uses linear interpolation to estimate missing values between known points—useful for time series.


🧯 9. Replace NaNs with Different Values by Column

df.fillna({
    'Name': 'Unknown',
    'Age': df['Age'].mean(),
    'Score': 0
}, inplace=True)

✔️ Use a dictionary to apply different filling strategies per column.


📌 Summary – Key Takeaways

Handling missing data is a foundational part of data preparation and cleaning. With Pandas, you can detect, drop, or fill NaNs efficiently using a variety of built-in tools.

🔍 Key Takeaways:

  • Use .isnull() and .sum() to detect and count NaNs
  • Use .dropna() to remove rows/columns with missing data
  • Use .fillna() or .interpolate() to fill gaps intelligently
  • Customize by column for targeted cleaning

⚙️ Real-world relevance: Essential in survey data, financial reports, sensor readings, and user-generated content workflows.


❓ FAQs – Handling Missing Data in Pandas

❓ What’s the difference between NaN and None in Pandas?
✅ Both are treated as missing values (NaN) under the hood using np.nan.


❓ How do I only drop rows if all values are missing?

df.dropna(how='all')

❓ Can I fill missing data with the mode (most frequent value)?

df['Column'].fillna(df['Column'].mode()[0])

❓ Does fillna() change the original DataFrame?
❌ No—unless you use inplace=True.


❓ Is interpolation good for all datasets?
⚠️ No. It works best for time-series or numerical data with logical progression.


Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

Pandas Handling Missing Data

Or Copy Link

CONTENTS
Scroll to Top