📊 Pandas Aggregation Techniques – Summarize Data for Deeper Insights
🧲 Introduction – What Is Aggregation in Pandas?
Aggregation refers to applying summary functions (like sum()
, mean()
, count()
, etc.) to a DataFrame or Series to reduce and summarize data. Pandas provides powerful and flexible tools for performing group-wise, column-wise, and custom aggregations—essential for reporting, EDA, and data pipelines.
🎯 In this guide, you’ll learn:
- Use built-in aggregation functions on Series and DataFrames
- Perform group-based aggregation using
groupby()
- Use multiple aggregations with
agg()
- Handle missing values and apply custom aggregation logic
📥 1. Sample DataFrame
import pandas as pd
df = pd.DataFrame({
'Department': ['HR', 'IT', 'HR', 'Finance', 'IT', 'HR'],
'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Salary': [50000, 60000, 52000, 70000, 62000, 51000],
'Bonus': [5000, 7000, 5500, 8000, 7500, 5200]
})
📦 2. Apply Simple Aggregation Functions
df['Salary'].mean() # Average salary
df[['Salary', 'Bonus']].sum() # Sum per column
df[['Salary', 'Bonus']].max() # Maximum value per column
✔️ Aggregations can be performed on entire columns.
👥 3. Group-Based Aggregation with groupby()
df.groupby('Department')['Salary'].mean()
👉 Output:
Department
Finance 70000.0
HR 51000.0
IT 61000.0
✔️ Groups data by department and computes average salary per group.
🧮 4. Aggregate Multiple Columns
df.groupby('Department')[['Salary', 'Bonus']].sum()
✔️ Aggregates both 'Salary'
and 'Bonus'
by department.
🔧 5. Multiple Aggregations with .agg()
df.groupby('Department').agg({
'Salary': ['mean', 'min', 'max'],
'Bonus': ['sum', 'count']
})
✔️ Applies multiple functions per column in a single call.
👉 Output:
Salary Bonus
mean min max sum count
Department
Finance 70000 70000 70000 8000 1
HR 51000 50000 52000 15700 3
IT 61000 60000 62000 14500 2
🧠 6. Custom Aggregation Functions
df.groupby('Department')['Salary'].agg(lambda x: x.std())
✔️ Use a lambda or custom function in agg()
for more control.
🧾 7. Aggregate Without Grouping (Entire Frame)
df.agg({
'Salary': ['mean', 'std'],
'Bonus': ['min', 'max']
})
✔️ Apply aggregations across the whole DataFrame.
🧮 8. Aggregation with pivot_table()
(Advanced)
df.pivot_table(values='Salary', index='Department', aggfunc='mean')
✔️ Similar to groupby()
, but supports multi-level summaries and missing value filling.
📌 Summary – Key Takeaways
Pandas provides robust and flexible aggregation techniques for summarizing, analyzing, and transforming datasets. Whether working with raw data or grouped data, agg()
and groupby()
are your best tools for efficient summarization.
🔍 Key Takeaways:
- Use
.mean()
,.sum()
,.count()
, etc. for basic stats - Use
.groupby()
to aggregate by one or more keys - Use
.agg()
for multiple aggregations per column - Support for custom functions and lambdas
- Use
pivot_table()
for advanced aggregation + reshaping
⚙️ Real-world relevance: Common in reporting dashboards, BI summaries, performance reviews, and grouped analytics.
❓ FAQs – Aggregation in Pandas
❓ What’s the difference between agg()
and apply()
?
.agg()
→ For summarization (returns scalar per group).apply()
→ For row/column transformations (returns Series/DataFrame)
❓ Can I group by multiple columns?
Yes:
df.groupby(['Department', 'Employee'])['Salary'].sum()
❓ How do I reset the index after aggregation?
Use:
df.groupby('Department').sum().reset_index()
❓ What’s the best way to apply multiple aggregations to one column?
Use .agg()
:
df['Salary'].agg(['mean', 'min', 'max'])
❓ Is pivot_table()
better than groupby()
?
It depends:
pivot_table()
is more flexible for reshaping and multi-indexgroupby()
is simpler for flat aggregations
Share Now :