6️⃣ 📊 Pandas Statistical Analysis & Aggregation
Estimated reading: 3 minutes 42 views

📊 Pandas Descriptive Statistics – Summarize and Explore Your Data Easily


🧲 Introduction – Why Use Descriptive Statistics in Pandas?

Descriptive statistics help you quickly understand the distribution, central tendency, and spread of your dataset. Pandas offers a wide set of functions to summarize numeric, categorical, and mixed-type data, making it ideal for exploratory data analysis (EDA).

🎯 In this guide, you’ll learn:

  • How to compute key statistics (mean, median, mode, std, etc.)
  • Use .describe() and .info() for automatic summaries
  • Explore data column-wise or group-wise
  • Handle missing values and non-numeric columns

📥 1. Sample DataFrame

import pandas as pd

df = pd.DataFrame({
    'Age': [25, 30, 22, 35, 28],
    'Score': [88, 92, 85, 90, 87],
    'Passed': [True, True, True, True, False]
})

📦 2. Get Quick Summary with .describe()

df.describe()

👉 Output (numerical columns only):

             Age      Score
count   5.000000   5.000000
mean   28.000000  88.400000
std     4.183300   2.701851
min    22.000000  85.000000
25%    25.000000  87.000000
50%    28.000000  88.000000
75%    30.000000  90.000000
max    35.000000  92.000000

✔️ Use df.describe(include='all') to show all columns, including non-numeric ones.


🧠 3. Calculate Individual Statistics

df['Score'].mean()       # Mean
df['Score'].median()     # Median
df['Score'].mode()       # Mode (returns Series)
df['Score'].std()        # Standard deviation
df['Score'].var()        # Variance
df['Score'].min()        # Minimum
df['Score'].max()        # Maximum
df['Score'].sum()        # Total sum

🧮 4. Count Non-Null Entries

df.count()

✔️ Returns the number of non-null values for each column.


🧾 5. Value Counts for Categorical Data

df['Passed'].value_counts()

👉 Output:

True     4
False    1
Name: Passed, dtype: int64

📊 6. Correlation and Covariance

df.corr()     # Pearson correlation
df.cov()      # Covariance matrix

✔️ Shows relationships between numeric columns.


📈 7. Get Quantiles and Ranges

df['Age'].quantile(0.75)  # 75th percentile
df['Age'].max() - df['Age'].min()  # Range

🧪 8. Summary with .info()

df.info()

✔️ Shows:

  • Number of non-null values
  • Data types
  • Memory usage

📉 9. Apply Stats Across Rows

df.mean(axis=1)

✔️ Calculates mean per row (only numeric columns).


📌 Summary – Key Takeaways

Descriptive statistics are the foundation of data exploration. Pandas provides flexible, readable functions to quickly assess the shape, spread, and health of your dataset.

🔍 Key Takeaways:

  • Use .describe() and .info() for quick summaries
  • Compute individual stats with .mean(), .median(), .std(), etc.
  • Analyze relationships using .corr() and .cov()
  • Use value_counts() for frequency analysis
  • Control axis: axis=0 (columns), axis=1 (rows)

⚙️ Real-world relevance: Used in reporting, analytics, anomaly detection, business intelligence, and machine learning feature understanding.


❓ FAQs – Descriptive Statistics in Pandas

❓ How do I get stats for non-numeric columns?
Use:

df.describe(include='object')

❓ What’s the difference between mean() and median()?

  • mean() → average (sensitive to outliers)
  • median() → middle value (robust to outliers)

❓ How do I find missing values?
Use:

df.isnull().sum()

❓ Can I summarize only selected columns?
Yes:

df[['Age', 'Score']].describe()

❓ Does describe() include Boolean columns?
Only if you use:

df.describe(include='all')

Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

Pandas Descriptive Statistics

Or Copy Link

CONTENTS
Scroll to Top