6️⃣ 📊 Pandas Statistical Analysis & Aggregation

Estimated reading: 3 minutes 146 views

📊 Pandas Aggregation Techniques – Summarize Data for Deeper Insights

🧲 Introduction – What Is Aggregation in Pandas?

Aggregation refers to applying summary functions (like sum(), mean(), count(), etc.) to a DataFrame or Series to reduce and summarize data. Pandas provides powerful and flexible tools for performing group-wise, column-wise, and custom aggregations—essential for reporting, EDA, and data pipelines.

🎯 In this guide, you’ll learn:

Use built-in aggregation functions on Series and DataFrames
Perform group-based aggregation using groupby()
Use multiple aggregations with agg()
Handle missing values and apply custom aggregation logic

📥 1. Sample DataFrame

import pandas as pd

df = pd.DataFrame({
    'Department': ['HR', 'IT', 'HR', 'Finance', 'IT', 'HR'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
    'Salary': [50000, 60000, 52000, 70000, 62000, 51000],
    'Bonus': [5000, 7000, 5500, 8000, 7500, 5200]
})

📦 2. Apply Simple Aggregation Functions

df['Salary'].mean()       # Average salary
df[['Salary', 'Bonus']].sum()  # Sum per column
df[['Salary', 'Bonus']].max()  # Maximum value per column

✔️ Aggregations can be performed on entire columns.

👥 3. Group-Based Aggregation with `groupby()`

df.groupby('Department')['Salary'].mean()

👉 Output:

Department
Finance    70000.0
HR         51000.0
IT         61000.0

✔️ Groups data by department and computes average salary per group.

🧮 4. Aggregate Multiple Columns

df.groupby('Department')[['Salary', 'Bonus']].sum()

✔️ Aggregates both 'Salary' and 'Bonus' by department.

🔧 5. Multiple Aggregations with `.agg()`

df.groupby('Department').agg({
    'Salary': ['mean', 'min', 'max'],
    'Bonus': ['sum', 'count']
})

✔️ Applies multiple functions per column in a single call.

👉 Output:

            Salary                  Bonus     
              mean    min    max    sum count
Department                                   
Finance     70000  70000  70000   8000     1
HR          51000  50000  52000  15700     3
IT          61000  60000  62000  14500     2

🧠 6. Custom Aggregation Functions

df.groupby('Department')['Salary'].agg(lambda x: x.std())

✔️ Use a lambda or custom function in agg() for more control.

🧾 7. Aggregate Without Grouping (Entire Frame)

df.agg({
    'Salary': ['mean', 'std'],
    'Bonus': ['min', 'max']
})

✔️ Apply aggregations across the whole DataFrame.

🧮 8. Aggregation with `pivot_table()` (Advanced)

df.pivot_table(values='Salary', index='Department', aggfunc='mean')

✔️ Similar to groupby(), but supports multi-level summaries and missing value filling.

📌 Summary – Key Takeaways

Pandas provides robust and flexible aggregation techniques for summarizing, analyzing, and transforming datasets. Whether working with raw data or grouped data, agg() and groupby() are your best tools for efficient summarization.

🔍 Key Takeaways:

Use .mean(), .sum(), .count(), etc. for basic stats
Use .groupby() to aggregate by one or more keys
Use .agg() for multiple aggregations per column
Support for custom functions and lambdas
Use pivot_table() for advanced aggregation + reshaping

⚙️ Real-world relevance: Common in reporting dashboards, BI summaries, performance reviews, and grouped analytics.

❓ FAQs – Aggregation in Pandas

❓ What’s the difference between agg() and apply()?

.agg() → For summarization (returns scalar per group)
.apply() → For row/column transformations (returns Series/DataFrame)

❓ Can I group by multiple columns?
Yes:

df.groupby(['Department', 'Employee'])['Salary'].sum()

❓ How do I reset the index after aggregation?
Use:

df.groupby('Department').sum().reset_index()

❓ What’s the best way to apply multiple aggregations to one column?
Use .agg():

df['Salary'].agg(['mean', 'min', 'max'])

❓ Is pivot_table() better than groupby()?
It depends:

pivot_table() is more flexible for reshaping and multi-index
groupby() is simpler for flat aggregations

« Previous Next »

Share Now :