📈 Pandas Interpolating Missing Values – Estimate NaNs with Intelligent Fill
🧲 Introduction – Why Use Interpolation?
Missing data in numeric or time series datasets can disrupt analysis. Instead of guessing or dropping rows, interpolation lets you estimate missing values based on existing data trends. Pandas provides .interpolate() to fill NaNs using linear, polynomial, time-based, or index-based methods—ideal for scientific, financial, and temporal datasets.
🎯 In this guide, you’ll learn:
- How
.interpolate()works in Pandas - Different interpolation methods (
linear,time,index, etc.) - Use with numeric, date-indexed, and time series data
- Control direction, limits, and fill scope
📥 1. Create a Sample DataFrame with Missing Values
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Day': pd.date_range(start='2023-01-01', periods=7),
'Temperature': [30, np.nan, np.nan, 36, np.nan, 38, 40]
})
df.set_index('Day', inplace=True)
print(df)
👉 Output:
Temperature
Day
2023-01-01 30.0
2023-01-02 NaN
2023-01-03 NaN
2023-01-04 36.0
2023-01-05 NaN
2023-01-06 38.0
2023-01-07 40.0
🔁 2. Default Linear Interpolation
df_interp = df.interpolate()
print(df_interp)
✔️ Fills NaNs using linearly spaced values between known numbers.
👉 Output:
Temperature
2023-01-01 30.0
2023-01-02 32.0
2023-01-03 34.0
2023-01-04 36.0
2023-01-05 37.0
2023-01-06 38.0
2023-01-07 40.0
⏱️ 3. Time-Based Interpolation
df_interp_time = df.interpolate(method='time')
✔️ Uses actual timestamps to weight the interpolation (ideal for uneven time gaps).
📈 4. Polynomial Interpolation
df_poly = df.interpolate(method='polynomial', order=2)
✔️ Uses a polynomial curve fit (quadratic if order=2) to estimate values.
⚠️ Requires numeric columns and at least as many points as the order.
🧮 5. Interpolation Based on Index
df_interp_index = df.interpolate(method='index')
✔️ Uses the numeric value of the index to estimate values—useful if your index has meaning (like position, time, or distance).
🧭 6. Set Direction of Interpolation
df.interpolate(limit_direction='backward')
✔️ Fills values in reverse order (bottom to top).
df.interpolate(limit_direction='both')
✔️ Fills values in both directions, useful for leading/trailing NaNs.
⛔ 7. Limit Number of Consecutive NaNs to Fill
df.interpolate(limit=1)
✔️ Only fills one NaN per consecutive block, leaving others untouched.
📊 8. Interpolate All Columns in a DataFrame
df = pd.DataFrame({
'A': [1, 2, np.nan, 4],
'B': [5, np.nan, np.nan, 8]
})
df.interpolate()
✔️ Automatically interpolates column-wise, if numeric.
📌 Summary – Key Takeaways
Interpolation is one of the most accurate and non-destructive ways to fill missing values—especially in numeric or time-indexed data. Pandas makes it flexible and easy to control.
🔍 Key Takeaways:
.interpolate()estimates missing values based on data trend- Supports methods:
linear(default),time,index,polynomial,spline, etc. - Use
limit,limit_direction, andaxisto control scope - Ideal for scientific, weather, sensor, and time series data
⚙️ Real-world relevance: Common in financial forecasting, signal correction, climate modeling, and stock market data cleaning.
❓ FAQs – Interpolating in Pandas
❓ What’s the default method used by interpolate()?
✅ Linear interpolation.
❓ Can I interpolate with respect to time or index?
Yes:
df.interpolate(method='time') # For time-indexed data
df.interpolate(method='index') # Uses numerical index
❓ How do I avoid filling too many missing values at once?
Use:
df.interpolate(limit=1)
❓ Can I use interpolation with categorical data?
❌ No. Interpolation only works on numeric columns.
❓ Does interpolation modify the original DataFrame?
❌ No—unless you use inplace=True.
Share Now :
