6️⃣ 📊 Pandas Statistical Analysis & Aggregation

Estimated reading: 3 minutes 279 views

Pandas Descriptive Statistics – Summarize and Explore Your Data Easily

Introduction – Why Use Descriptive Statistics in Pandas?

Descriptive statistics help you quickly understand the distribution, central tendency, and spread of your dataset. Pandas offers a wide set of functions to summarize numeric, categorical, and mixed-type data, making it ideal for exploratory data analysis (EDA).

In this guide, you’ll learn:

How to compute key statistics (mean, median, mode, std, etc.)
Use .describe() and .info() for automatic summaries
Explore data column-wise or group-wise
Handle missing values and non-numeric columns

1. Sample DataFrame

import pandas as pd

df = pd.DataFrame({
    'Age': [25, 30, 22, 35, 28],
    'Score': [88, 92, 85, 90, 87],
    'Passed': [True, True, True, True, False]
})

2. Get Quick Summary with `.describe()`

df.describe()

Output (numerical columns only):

             Age      Score
count   5.000000   5.000000
mean   28.000000  88.400000
std     4.183300   2.701851
min    22.000000  85.000000
25%    25.000000  87.000000
50%    28.000000  88.000000
75%    30.000000  90.000000
max    35.000000  92.000000

✔️ Use df.describe(include='all') to show all columns, including non-numeric ones.

3. Calculate Individual Statistics

df['Score'].mean()       # Mean
df['Score'].median()     # Median
df['Score'].mode()       # Mode (returns Series)
df['Score'].std()        # Standard deviation
df['Score'].var()        # Variance
df['Score'].min()        # Minimum
df['Score'].max()        # Maximum
df['Score'].sum()        # Total sum

4. Count Non-Null Entries

df.count()

✔️ Returns the number of non-null values for each column.

5. Value Counts for Categorical Data

df['Passed'].value_counts()

Output:

True     4
False    1
Name: Passed, dtype: int64

6. Correlation and Covariance

df.corr()     # Pearson correlation
df.cov()      # Covariance matrix

✔️ Shows relationships between numeric columns.

7. Get Quantiles and Ranges

df['Age'].quantile(0.75)  # 75th percentile
df['Age'].max() - df['Age'].min()  # Range

8. Summary with `.info()`

df.info()

✔️ Shows:

Number of non-null values
Data types
Memory usage

9. Apply Stats Across Rows

df.mean(axis=1)

✔️ Calculates mean per row (only numeric columns).

Summary – Key Takeaways

Descriptive statistics are the foundation of data exploration. Pandas provides flexible, readable functions to quickly assess the shape, spread, and health of your dataset.

Key Takeaways:

Use .describe() and .info() for quick summaries
Compute individual stats with .mean(), .median(), .std(), etc.
Analyze relationships using .corr() and .cov()
Use value_counts() for frequency analysis
Control axis: axis=0 (columns), axis=1 (rows)

Real-world relevance: Used in reporting, analytics, anomaly detection, business intelligence, and machine learning feature understanding.

FAQs – Descriptive Statistics in Pandas

How do I get stats for non-numeric columns?
Use:

df.describe(include='object')

What’s the difference between mean() and median()?

mean() → average (sensitive to outliers)
median() → middle value (robust to outliers)

How do I find missing values?
Use:

df.isnull().sum()

Can I summarize only selected columns?
Yes:

df[['Age', 'Score']].describe()

Does describe() include Boolean columns?
Only if you use:

df.describe(include='all')

« Previous Next »

Share Now :