5️⃣ 🔍 Pandas Data Manipulation & Transformation – Clean, Sort & Combine Efficiently
🧲 Introduction – Why Learn Pandas Data Manipulation?
Raw data is only useful when it’s well-structured and transformed to suit the needs of analysis. Pandas provides a powerful and flexible toolkit for sorting, merging, filtering, reindexing, and transforming large datasets efficiently. This section dives deep into practical manipulation techniques essential for real-world data science and analytics workflows.
🎯 In this guide, you’ll learn:
- How to sort, reindex, and iterate through DataFrames
- Combine datasets using concatenation and joins
- Apply functions across series/dataframes
- Use boolean logic and binary operations for filtering
📘 Topics Covered
🔍 Topic | 💡 Description |
---|---|
Pandas Sorting and Reindexing | Organize your data by labels or values |
Pandas Iteration Over Data | Loop through DataFrames row-wise/column-wise |
Pandas Concatenation | Combine DataFrames vertically or horizontally |
Pandas Merging and Joining | Database-style joins using keys or indices |
Pandas Function Application | Apply functions with .apply() , .map() , .agg() |
Pandas Options & Customization | Customize display, output, and precision |
Pandas Binary Operations & Boolean Indexing | Perform math across aligned objects |
Pandas Boolean Masking | Filter rows/columns with boolean expressions |
Pandas Binary Comparison Operations | Use == , != , > , < , etc. across objects |
🔃 Pandas Sorting and Reindexing
Sort by column values:
df.sort_values(by='age', ascending=False)
Sort by index:
df.sort_index()
Reindexing:
df.reindex([2, 0, 1])
🔁 Pandas Iteration Over Data
for index, row in df.iterrows():
print(row['name'], row['age'])
Use .iterrows()
for row-wise and .itertuples()
for faster performance.
🔗 Pandas Concatenation
Vertical (axis=0):
pd.concat([df1, df2])
Horizontal (axis=1):
pd.concat([df1, df2], axis=1)
🔀 Pandas Merging and Joining
pd.merge(df1, df2, on='id', how='inner')
Join types: inner
, outer
, left
, right
.
🧠 Pandas Function Application
Use .apply()
on rows/columns:
df['age_squared'] = df['age'].apply(lambda x: x**2)
Aggregate multiple columns:
df[['math', 'science']].agg(['mean', 'sum'])
⚙️ Pandas Options & Customization
pd.set_option("display.max_rows", 10)
pd.set_option("display.float_format", "{:.2f}".format)
Customize precision, output size, and print formatting.
🧮 Pandas Binary Operations & Boolean Indexing
df[df['salary'] > 50000]
Also supports element-wise operations:
df1 + df2
df1 & df2
🧪 Pandas Boolean Masking
Create custom filters:
mask = (df['score'] > 80) & (df['gender'] == 'F')
df[mask]
Use for advanced filtering of rows or conditions.
➗ Pandas Binary Comparison Operations
df['math'] > df['science']
Comparison with scalars:
df['score'] >= 60
Returns boolean series used for filtering or counting.
📌 Summary – Recap & Next Steps
Mastering data manipulation is essential to wrangle and prepare your data effectively. Pandas gives you the power to combine, filter, iterate, and apply functions to transform raw data into structured insights.
🔍 Key Takeaways:
- Use
.sort_values()
and.reindex()
to organize data - Combine datasets with
concat()
,merge()
, andjoin()
- Apply transformations with
.apply()
,.agg()
, and.map()
- Use boolean logic and binary operations for filtering and comparison
⚙️ Real-World Relevance:
Data manipulation is at the heart of analytics. These operations enable developers and data scientists to build pipelines, clean datasets, and make smarter decisions efficiently.
❓ FAQ – Pandas Data Manipulation & Transformation
❓ What’s the difference between concat()
and merge()
?
✅ concat()
simply stacks dataframes by axis, while merge()
is used for database-style joins on keys.
❓ Can I apply a function to an entire column?
✅ Yes, using .apply()
or .map()
on Series, and .applymap()
on DataFrames.
❓ How do I change the index of a DataFrame?
✅ Use .reindex()
to reset the order or .set_index()
to use a column as index.
❓ How to combine data with overlapping indices?
✅ Use binary operations (+
, -
) or combine_first()
to merge overlapping data gracefully.
❓ How do I filter rows based on multiple conditions?
✅ Use boolean masking with &
, |
, and parentheses around each condition.
Share Now :