🧾 Pandas DataFrames – The Foundation of Tabular Data in Python
🧲 Introduction – What Is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional, tabular data structure that resembles a spreadsheet or SQL table. It’s one of the most powerful and widely used data types in Python for data analysis, manipulation, and visualization.
🎯 In this guide, you’ll learn:
- How to create DataFrames from different data sources
- Accessing and modifying data inside DataFrames
- Key operations like filtering, slicing, and aggregating
- Real-world examples of DataFrame usage
🛠️ 1. Create a DataFrame from a Dictionary
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Score': [85.5, 90.3, 78.9]
}
df = pd.DataFrame(data)
print(df)
👉 Output:
Name Age Score
0 Alice 25 85.5
1 Bob 30 90.3
2 Charlie 35 78.9
✅ Each key becomes a column; values become rows.
📂 2. Create DataFrame from List of Lists
data = [['Alice', 25], ['Bob', 30], ['Charlie', 35]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)
✅ Use this when your source data is structured like rows in a table.
📥 3. Create DataFrame from CSV/Excel
df = pd.read_csv('data.csv') # From CSV
df = pd.read_excel('data.xlsx') # From Excel
✅ Pandas supports many formats including JSON, SQL, clipboard, and HTML.
🔍 4. Inspecting DataFrames
Method | Description |
---|---|
df.head() | First 5 rows |
df.tail() | Last 5 rows |
df.shape | Tuple of (rows, columns) |
df.info() | Summary of structure |
df.describe() | Statistical summary of numeric columns |
df.columns | Column labels |
df.index | Row index info |
🎯 5. Accessing Data
Access Column(s)
print(df['Name']) # Single column
print(df[['Name', 'Score']]) # Multiple columns
Access Row(s)
print(df.loc[0]) # By label
print(df.iloc[1]) # By position
✅ Use .loc[]
for label-based and .iloc[]
for position-based access.
🔄 6. Modifying Data
df['Passed'] = df['Score'] > 80 # Add new column
df.at[1, 'Age'] = 32 # Modify specific cell
✅ DataFrames are mutable—modify structure and content easily.
✂️ 7. Filtering and Slicing
print(df[df['Age'] > 28]) # Filter
print(df.iloc[0:2]) # Slice by position
print(df.loc[0:1, ['Name']]) # Slice by label and column
🔁 8. Aggregation and Summary Stats
print(df['Age'].mean()) # Average age
print(df['Score'].sum()) # Total score
print(df[['Age', 'Score']].max()) # Max values
✅ Supports built-in aggregation: mean()
, sum()
, max()
, min()
, count()
📋 9. Common DataFrame Operations
Operation | Syntax Example |
---|---|
Rename columns | df.rename(columns={'Age':'Years'}) |
Drop column | df.drop('Score', axis=1) |
Drop row | df.drop(0, axis=0) |
Sort by column | df.sort_values(by='Score') |
Reset index | df.reset_index(drop=True) |
Set new index | df.set_index('Name') |
📌 Summary – Recap & Next Steps
Pandas DataFrames offer a rich, intuitive interface for 2D structured data, letting you manipulate, filter, and analyze information efficiently and expressively.
🔍 Key Takeaways:
- DataFrames are 2D tables with labeled rows and columns
- Easily created from dictionaries, lists, or files
- Use
.loc[]
and.iloc[]
for flexible data access - Perform filtering, aggregation, and transformation in a few lines
⚙️ Real-world relevance: DataFrames are used in everything from business analytics and machine learning to ETL pipelines and reporting dashboards.
❓ FAQs – Pandas DataFrames
❓ What is the difference between Series and DataFrame?
✅ A Series is 1D; a DataFrame is 2D with multiple columns.
❓ Can a DataFrame contain different data types?
✅ Yes. Each column can have a different data type.
❓ How to change the column order in a DataFrame?
Use:
df = df[['Score', 'Name', 'Age']]
❓ How do I export a DataFrame to CSV?
Use:
df.to_csv('output.csv', index=False)
❓ Can I merge or join DataFrames?
✅ Yes. Use pd.merge()
, df.join()
, or pd.concat()
for combining.
Share Now :