📊 Pandas Descriptive Statistics – Summarize and Explore Your Data Easily
🧲 Introduction – Why Use Descriptive Statistics in Pandas?
Descriptive statistics help you quickly understand the distribution, central tendency, and spread of your dataset. Pandas offers a wide set of functions to summarize numeric, categorical, and mixed-type data, making it ideal for exploratory data analysis (EDA).
🎯 In this guide, you’ll learn:
- How to compute key statistics (mean, median, mode, std, etc.)
- Use
.describe()and.info()for automatic summaries - Explore data column-wise or group-wise
- Handle missing values and non-numeric columns
📥 1. Sample DataFrame
import pandas as pd
df = pd.DataFrame({
'Age': [25, 30, 22, 35, 28],
'Score': [88, 92, 85, 90, 87],
'Passed': [True, True, True, True, False]
})
📦 2. Get Quick Summary with .describe()
df.describe()
👉 Output (numerical columns only):
Age Score
count 5.000000 5.000000
mean 28.000000 88.400000
std 4.183300 2.701851
min 22.000000 85.000000
25% 25.000000 87.000000
50% 28.000000 88.000000
75% 30.000000 90.000000
max 35.000000 92.000000
✔️ Use df.describe(include='all') to show all columns, including non-numeric ones.
🧠 3. Calculate Individual Statistics
df['Score'].mean() # Mean
df['Score'].median() # Median
df['Score'].mode() # Mode (returns Series)
df['Score'].std() # Standard deviation
df['Score'].var() # Variance
df['Score'].min() # Minimum
df['Score'].max() # Maximum
df['Score'].sum() # Total sum
🧮 4. Count Non-Null Entries
df.count()
✔️ Returns the number of non-null values for each column.
🧾 5. Value Counts for Categorical Data
df['Passed'].value_counts()
👉 Output:
True 4
False 1
Name: Passed, dtype: int64
📊 6. Correlation and Covariance
df.corr() # Pearson correlation
df.cov() # Covariance matrix
✔️ Shows relationships between numeric columns.
📈 7. Get Quantiles and Ranges
df['Age'].quantile(0.75) # 75th percentile
df['Age'].max() - df['Age'].min() # Range
🧪 8. Summary with .info()
df.info()
✔️ Shows:
- Number of non-null values
- Data types
- Memory usage
📉 9. Apply Stats Across Rows
df.mean(axis=1)
✔️ Calculates mean per row (only numeric columns).
📌 Summary – Key Takeaways
Descriptive statistics are the foundation of data exploration. Pandas provides flexible, readable functions to quickly assess the shape, spread, and health of your dataset.
🔍 Key Takeaways:
- Use
.describe()and.info()for quick summaries - Compute individual stats with
.mean(),.median(),.std(), etc. - Analyze relationships using
.corr()and.cov() - Use
value_counts()for frequency analysis - Control axis:
axis=0(columns),axis=1(rows)
⚙️ Real-world relevance: Used in reporting, analytics, anomaly detection, business intelligence, and machine learning feature understanding.
❓ FAQs – Descriptive Statistics in Pandas
❓ How do I get stats for non-numeric columns?
Use:
df.describe(include='object')
❓ What’s the difference between mean() and median()?
mean()→ average (sensitive to outliers)median()→ middle value (robust to outliers)
❓ How do I find missing values?
Use:
df.isnull().sum()
❓ Can I summarize only selected columns?
Yes:
df[['Age', 'Score']].describe()
❓ Does describe() include Boolean columns?
Only if you use:
df.describe(include='all')
Share Now :
