6️⃣ 📊 Pandas Statistical Analysis & Aggregation

Estimated reading: 3 minutes 56 views

👥 Pandas Grouping Data (GroupBy) – Analyze Data in Segments

🧲 Introduction – What is Grouping in Pandas?

Grouping is a powerful operation that lets you split data into groups, apply functions to each group, and combine the results into a structured summary. In Pandas, the groupby() method helps you aggregate and analyze data by categories, labels, or hierarchical keys.

🎯 In this guide, you’ll learn:

How to group data using groupby()
Apply aggregation functions like sum(), mean(), count()
Perform multiple aggregations with .agg()
Group by multiple columns and use custom functions

📥 1. Sample DataFrame

import pandas as pd

df = pd.DataFrame({
    'Department': ['HR', 'IT', 'HR', 'Finance', 'IT', 'HR'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
    'Salary': [50000, 60000, 52000, 70000, 62000, 51000],
    'Bonus': [5000, 7000, 5500, 8000, 7500, 5200]
})

🧱 2. Group by a Single Column

df.groupby('Department')['Salary'].mean()

✔️ Calculates the average salary per department.

👉 Output:

Department
Finance    70000.0
HR         51000.0
IT         61000.0

🔗 3. Group by Multiple Columns

df.groupby(['Department', 'Employee'])['Salary'].sum()

✔️ Groups by both department and employee for hierarchical aggregation.

📊 4. Aggregate with Built-in Functions

df.groupby('Department').sum(numeric_only=True)
df.groupby('Department').min()
df.groupby('Department').count()

✔️ Use .sum(), .min(), .count() to aggregate each group.

🧠 5. Multiple Aggregations with `.agg()`

df.groupby('Department').agg({
    'Salary': ['mean', 'max'],
    'Bonus': ['sum', 'count']
})

✔️ Apply different functions per column.

🎯 6. Apply Custom Functions

df.groupby('Department')['Salary'].agg(lambda x: x.std())

✔️ Use lambda or named custom functions to compute custom stats per group.

🔁 7. Iterate Over Groups

for name, group in df.groupby('Department'):
    print(f"Group: {name}")
    print(group)

✔️ Loop through grouped data for custom processing or inspection.

🧮 8. Group on Index

df_indexed = df.set_index('Department')
df_indexed.groupby(level=0)['Salary'].mean()

✔️ Groups using index levels (e.g., hierarchical indexes).

🧾 9. Reset Index After Grouping

df.groupby('Department').sum(numeric_only=True).reset_index()

✔️ Converts grouped result back to a regular DataFrame.

📌 Summary – Key Takeaways

Grouping is essential for segmenting your data and performing comparative analysis. Use groupby() to organize data logically and apply aggregate or custom functions across those groups.

🔍 Key Takeaways:

groupby() splits data into logical groups based on key columns or index
Combine with .sum(), .mean(), .agg(), etc. to compute per-group summaries
Supports multiple keys, custom functions, and nested aggregation
Use reset_index() to flatten the result

⚙️ Real-world relevance: Used in HR analytics, financial reporting, sales summaries, survey responses, and time series grouping.

❓ FAQs – Pandas Grouping Data with GroupBy

❓ Can I group by multiple columns?
✅ Yes:

df.groupby(['Department', 'Employee'])

❓ What’s the difference between .agg() and .apply()?

.agg() → For summarizing (returns scalar or grouped summaries)
.apply() → For custom row/column-level logic

❓ How do I get percentage contribution per group?

df.groupby('Department')['Salary'].apply(lambda x: x / x.sum())

❓ Can I group by index instead of a column?
Yes:

df.set_index('Department').groupby(level=0)

❓ How do I apply multiple aggregations to one column?
Use:

df.groupby('Department')['Salary'].agg(['mean', 'max', 'min'])

« Previous Next »

Share Now :