4️⃣ 🧹 Pandas Data Cleaning & Preprocessing

Estimated reading: 3 minutes 100 views

📈 Pandas Interpolating Missing Values – Estimate NaNs with Intelligent Fill

🧲 Introduction – Why Use Interpolation?

Missing data in numeric or time series datasets can disrupt analysis. Instead of guessing or dropping rows, interpolation lets you estimate missing values based on existing data trends. Pandas provides .interpolate() to fill NaNs using linear, polynomial, time-based, or index-based methods—ideal for scientific, financial, and temporal datasets.

🎯 In this guide, you’ll learn:

How .interpolate() works in Pandas
Different interpolation methods (linear, time, index, etc.)
Use with numeric, date-indexed, and time series data
Control direction, limits, and fill scope

📥 1. Create a Sample DataFrame with Missing Values

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Day': pd.date_range(start='2023-01-01', periods=7),
    'Temperature': [30, np.nan, np.nan, 36, np.nan, 38, 40]
})

df.set_index('Day', inplace=True)
print(df)

👉 Output:

            Temperature
Day                    
2023-01-01         30.0
2023-01-02          NaN
2023-01-03          NaN
2023-01-04         36.0
2023-01-05          NaN
2023-01-06         38.0
2023-01-07         40.0

🔁 2. Default Linear Interpolation

df_interp = df.interpolate()
print(df_interp)

✔️ Fills NaNs using linearly spaced values between known numbers.

👉 Output:

            Temperature
2023-01-01         30.0
2023-01-02         32.0
2023-01-03         34.0
2023-01-04         36.0
2023-01-05         37.0
2023-01-06         38.0
2023-01-07         40.0

⏱️ 3. Time-Based Interpolation

df_interp_time = df.interpolate(method='time')

✔️ Uses actual timestamps to weight the interpolation (ideal for uneven time gaps).

📈 4. Polynomial Interpolation

df_poly = df.interpolate(method='polynomial', order=2)

✔️ Uses a polynomial curve fit (quadratic if order=2) to estimate values.

⚠️ Requires numeric columns and at least as many points as the order.

🧮 5. Interpolation Based on Index

df_interp_index = df.interpolate(method='index')

✔️ Uses the numeric value of the index to estimate values—useful if your index has meaning (like position, time, or distance).

🧭 6. Set Direction of Interpolation

df.interpolate(limit_direction='backward')

✔️ Fills values in reverse order (bottom to top).

df.interpolate(limit_direction='both')

✔️ Fills values in both directions, useful for leading/trailing NaNs.

⛔ 7. Limit Number of Consecutive NaNs to Fill

df.interpolate(limit=1)

✔️ Only fills one NaN per consecutive block, leaving others untouched.

📊 8. Interpolate All Columns in a DataFrame

df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8]
})

df.interpolate()

✔️ Automatically interpolates column-wise, if numeric.

📌 Summary – Key Takeaways

Interpolation is one of the most accurate and non-destructive ways to fill missing values—especially in numeric or time-indexed data. Pandas makes it flexible and easy to control.

🔍 Key Takeaways:

.interpolate() estimates missing values based on data trend
Supports methods: linear (default), time, index, polynomial, spline, etc.
Use limit, limit_direction, and axis to control scope
Ideal for scientific, weather, sensor, and time series data

⚙️ Real-world relevance: Common in financial forecasting, signal correction, climate modeling, and stock market data cleaning.

❓ FAQs – Interpolating in Pandas

❓ What’s the default method used by interpolate()?
✅ Linear interpolation.

❓ Can I interpolate with respect to time or index?
Yes:

df.interpolate(method='time')  # For time-indexed data
df.interpolate(method='index')  # Uses numerical index

❓ How do I avoid filling too many missing values at once?
Use:

df.interpolate(limit=1)

❓ Can I use interpolation with categorical data?
❌ No. Interpolation only works on numeric columns.

❓ Does interpolation modify the original DataFrame?
❌ No—unless you use inplace=True.

« Previous Next »

Share Now :