R Data Frames – Tabular Data Handling in R Programming
Introduction – What Are Data Frames in R?
A data frame in R is a two-dimensional, tabular data structure—similar to a spreadsheet or SQL table—where each column can have a different data type (numeric, character, logical, etc.), but all columns must have the same length.
Data frames are central to data analysis in R. Most real-world data (from CSVs, databases, APIs) is loaded into R as a data frame.
In this guide, you’ll learn:
- How to create, access, and modify data frames
- Use column filtering, row subsetting, and data transformation
- Apply useful built-in functions for summary and structure
- Handle missing values and combine multiple data frames
Creating a Data Frame
students <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(22, 25, 23),
Passed = c(TRUE, TRUE, FALSE)
)
Data types:
Name: characterAge: numericPassed: logical
Use str(students) to inspect structure.
Accessing Data Frame Elements
By Column Name
students$Name # Returns column as vector
students[["Age"]] # Also returns column
By Index
students[1, ] # First row
students[, 2] # Second column
students[2, 1] # Row 2, Col 1
Subsetting with Conditions
students[students$Passed == TRUE, ] # Only those who passed
Modifying Data Frames
Add Column
students$Score <- c(90, 85, 70)
Remove Column
students$Score <- NULL
Add Row (with rbind())
new_row <- data.frame(Name = "David", Age = 24, Passed = TRUE)
students <- rbind(students, new_row)
Combining Data Frames
Row Bind
df1 <- data.frame(A = 1:2, B = c("x", "y"))
df2 <- data.frame(A = 3:4, B = c("z", "w"))
rbind(df1, df2)
Column Bind
df3 <- data.frame(C = c(TRUE, FALSE))
cbind(df1, df3)
Useful Data Frame Functions
| Function | Description |
|---|---|
str() | Structure of data frame |
summary() | Summary statistics |
nrow() / ncol() | Number of rows / columns |
names() | Column names |
rownames() | Row labels |
head() / tail() | Preview top or bottom rows |
subset() | Subset rows using condition |
Handling Missing Values (NA)
df <- data.frame(x = c(1, NA, 3))
is.na(df) # Identify missing
na.omit(df) # Remove rows with NA
Converting Between Data Types
| Convert From | To | Function |
|---|---|---|
| Matrix | Data Frame | as.data.frame() |
| List | Data Frame | as.data.frame() |
| Data Frame | Matrix | as.matrix() |
Summary – Recap & Next Steps
Data frames are the most practical and widely-used data structure for structured tabular data in R. Mastering their creation, manipulation, and filtering is key for data science workflows.
Key Takeaways:
- Create with
data.frame()using named columns - Access using
$,[row, col], or logical conditions - Add/remove rows and columns with
rbind()/cbind() - Use
summary(),str(), andhead()for inspection - Handle missing data with
is.na()andna.omit()
Real-World Relevance:
Used in nearly every R project: from importing Excel/CSV data, transforming datasets, modeling results, to exporting reports.
FAQs – R Data Frames
What is the difference between a data frame and a matrix in R?
A matrix holds only one data type; a data frame allows mixed types in different columns.
How can I filter rows in a data frame?
Use logical conditions:
df[df$Age > 25, ]
How do I add a new column to a data frame?
Assign a vector directly:
df$NewCol <- c(1, 2, 3)
How do I remove rows with missing values?
Use:
na.omit(df)
How can I preview the top 5 rows?
Use:
head(df, 5)
Share Now :
