👥 Pandas Grouping Data (GroupBy) – Analyze Data in Segments
🧲 Introduction – What is Grouping in Pandas?
Grouping is a powerful operation that lets you split data into groups, apply functions to each group, and combine the results into a structured summary. In Pandas, the groupby()
method helps you aggregate and analyze data by categories, labels, or hierarchical keys.
🎯 In this guide, you’ll learn:
- How to group data using
groupby()
- Apply aggregation functions like
sum()
,mean()
,count()
- Perform multiple aggregations with
.agg()
- Group by multiple columns and use custom functions
📥 1. Sample DataFrame
import pandas as pd
df = pd.DataFrame({
'Department': ['HR', 'IT', 'HR', 'Finance', 'IT', 'HR'],
'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Salary': [50000, 60000, 52000, 70000, 62000, 51000],
'Bonus': [5000, 7000, 5500, 8000, 7500, 5200]
})
🧱 2. Group by a Single Column
df.groupby('Department')['Salary'].mean()
✔️ Calculates the average salary per department.
👉 Output:
Department
Finance 70000.0
HR 51000.0
IT 61000.0
🔗 3. Group by Multiple Columns
df.groupby(['Department', 'Employee'])['Salary'].sum()
✔️ Groups by both department and employee for hierarchical aggregation.
📊 4. Aggregate with Built-in Functions
df.groupby('Department').sum(numeric_only=True)
df.groupby('Department').min()
df.groupby('Department').count()
✔️ Use .sum()
, .min()
, .count()
to aggregate each group.
🧠 5. Multiple Aggregations with .agg()
df.groupby('Department').agg({
'Salary': ['mean', 'max'],
'Bonus': ['sum', 'count']
})
✔️ Apply different functions per column.
🎯 6. Apply Custom Functions
df.groupby('Department')['Salary'].agg(lambda x: x.std())
✔️ Use lambda
or named custom functions to compute custom stats per group.
🔁 7. Iterate Over Groups
for name, group in df.groupby('Department'):
print(f"Group: {name}")
print(group)
✔️ Loop through grouped data for custom processing or inspection.
🧮 8. Group on Index
df_indexed = df.set_index('Department')
df_indexed.groupby(level=0)['Salary'].mean()
✔️ Groups using index levels (e.g., hierarchical indexes).
🧾 9. Reset Index After Grouping
df.groupby('Department').sum(numeric_only=True).reset_index()
✔️ Converts grouped result back to a regular DataFrame.
📌 Summary – Key Takeaways
Grouping is essential for segmenting your data and performing comparative analysis. Use groupby()
to organize data logically and apply aggregate or custom functions across those groups.
🔍 Key Takeaways:
groupby()
splits data into logical groups based on key columns or index- Combine with
.sum()
,.mean()
,.agg()
, etc. to compute per-group summaries - Supports multiple keys, custom functions, and nested aggregation
- Use
reset_index()
to flatten the result
⚙️ Real-world relevance: Used in HR analytics, financial reporting, sales summaries, survey responses, and time series grouping.
❓ FAQs – Pandas Grouping Data with GroupBy
❓ Can I group by multiple columns?
✅ Yes:
df.groupby(['Department', 'Employee'])
❓ What’s the difference between .agg()
and .apply()
?
.agg()
→ For summarizing (returns scalar or grouped summaries).apply()
→ For custom row/column-level logic
❓ How do I get percentage contribution per group?
df.groupby('Department')['Salary'].apply(lambda x: x / x.sum())
❓ Can I group by index instead of a column?
Yes:
df.set_index('Department').groupby(level=0)
❓ How do I apply multiple aggregations to one column?
Use:
df.groupby('Department')['Salary'].agg(['mean', 'max', 'min'])
Share Now :