📈 Pandas Statistical Functions – Analyze Data with Built-in Math Tools
🧲 Introduction – Why Use Statistical Functions in Pandas?
Pandas provides a rich set of statistical functions to perform common operations like mean, median, standard deviation, skewness, and correlation. These functions help you gain insights, trends, and distributions from your data without using external libraries like NumPy or SciPy.
🎯 In this guide, you’ll learn:
- How to use built-in statistical functions on Series and DataFrames
- Get summary stats, correlations, rankings, and cumulative metrics
- Apply functions row-wise, column-wise, or across groups
- Handle NaN values in calculations
📥 1. Sample DataFrame
import pandas as pd
df = pd.DataFrame({
'Math': [85, 92, 88, 79, 95],
'Science': [89, 94, 86, 82, 91],
'English': [78, 85, 82, 88, 90]
})
🧮 2. Basic Statistical Functions
df.mean() # Column-wise mean
df.median() # Median
df.mode() # Most frequent value (returns Series)
df.std() # Standard deviation
df.var() # Variance
df.min() # Minimum
df.max() # Maximum
df.sum() # Total sum
df.count() # Non-null count
✔️ Use axis=1 to apply row-wise instead of column-wise.
📊 3. Cumulative Statistics
df.cumsum() # Cumulative sum
df.cumprod() # Cumulative product
df.cummax() # Cumulative maximum
df.cummin() # Cumulative minimum
✔️ Useful for running totals, financial time series, etc.
📉 4. Skewness and Kurtosis
df.skew() # Measures data asymmetry
df.kurt() # Measures "tailedness" of distribution
✔️ Helpful for distribution shape analysis.
🔗 5. Correlation and Covariance
df.corr() # Pearson correlation between columns
df.cov() # Covariance matrix
✔️ Used for relationship analysis between numerical variables.
🔢 6. Ranking and Percentiles
df.rank() # Assign rank to each value
df['Math'].quantile(0.75) # 75th percentile
✔️ Percentiles and ranks help in grading, tiering, and percent-based logic.
🧠 7. Apply Stats to Rows
df.mean(axis=1) # Mean per student
df.max(axis=1) # Highest score per student
⚙️ 8. Handle Missing Data in Stats
df_with_nan = df.copy()
df_with_nan.iloc[2, 1] = None # Introduce NaN
df_with_nan.mean(skipna=True) # Default: skips NaN
df_with_nan.mean(skipna=False) # Returns NaN if any missing
✔️ Use skipna to control how NaNs are treated.
📌 Summary – Key Takeaways
Pandas statistical functions let you analyze and summarize datasets efficiently, from simple averages to advanced distribution metrics.
🔍 Key Takeaways:
- Use
.mean(),.std(),.sum(), etc. for quick insights - Use
.corr()and.cov()for relationships - Use
.rank(),.quantile(),.skew()for advanced analytics - Handle NaNs with
skipnaparameter - Choose
axis=0for column-wise,axis=1for row-wise stats
⚙️ Real-world relevance: Useful in EDA, machine learning preprocessing, KPI dashboards, and statistical reporting.
❓ FAQs – Pandas Statistical Functions
❓ How do I calculate the mean for each row?
Use:
df.mean(axis=1)
❓ What’s the difference between .mean() and .median()?
.mean()→ average (sensitive to outliers).median()→ middle value (robust to outliers)
❓ Can I calculate correlation between two specific columns?
Yes:
df['Math'].corr(df['Science'])
❓ How do I include/exclude NaN in calculations?
Use skipna=True or skipna=False in most functions.
❓ Can I apply a statistical function across groups?
Yes:
df.groupby('group')['Score'].mean()
Share Now :
