🏷️ R Factors – Handle Categorical Data with Levels in R (with Code Explanation)
🧲 Introduction – What Are Factors in R?
In R, a factor is a data structure used to store categorical data, such as gender, grades, or survey responses. Factors are stored as integer vectors with labels (called levels), making them more memory-efficient and suitable for statistical modeling compared to character vectors.
🎯 In this guide, you’ll learn:
- How to create and inspect factors
- How to modify and order levels
- How to use factors inside data frames
- Real-world use cases and detailed explanations of code
🧪 Creating a Factor in R
responses <- c("Yes", "No", "Yes", "Maybe", "No")
fact <- factor(responses)
print(fact)
🔍 Explanation:
responsesis a character vector with repeated categories.factor(responses)converts it to a factor where R identifies the unique categories:"Maybe","No", and"Yes"(alphabetical order).- Internally, these are stored as integers:
1 = "Maybe",2 = "No",3 = "Yes".
🧾 Output:
[1] Yes No Yes Maybe No
Levels: Maybe No Yes
🔠 Creating Ordered Factors
grades <- c("C", "B", "A", "B", "C")
ordered_grades <- factor(grades, levels = c("C", "B", "A"), ordered = TRUE)
🔍 Explanation:
- We manually set
levels = c("C", "B", "A")to define a ranking. ordered = TRUEtells R that"A"is the highest and"C"is the lowest.- This enables logical comparisons like
<or>.
ordered_grades[1] < ordered_grades[2] # TRUE
🔁 Modifying Factor Levels
levels(fact) <- c("May", "Nope", "Yes")
print(fact)
🔍 Explanation:
- We’re renaming the levels from
"Maybe","No","Yes"to"May","Nope", and"Yes". - The order must match the original levels exactly.
🔄 Converting Between Types
as.character(fact) # Converts factor to character
as.numeric(fact) # Shows underlying integer codes
🔍 Explanation:
as.character()retrieves the category labels.as.numeric()gives internal encoding (e.g., “May” = 1, “Nope” = 2, etc.)
📊 Using Factors in Data Frames
df <- data.frame(Name = c("Tom", "Jane"), Gender = factor(c("M", "F")))
str(df)
🔍 Explanation:
- The
Gendercolumn is explicitly made a factor. str()shows the structure—Genderis stored with levels, not just text.
📈 Frequency Analysis with table() and summary()
table(fact) # Frequency count of each level
summary(fact) # Summarized frequency with level names
🔍 Explanation:
table()shows how many times each category occurs.summary()is similar but prints it in a formatted report-like way.
⚠️ Handling Invalid Level Assignments
f <- factor(c("A", "B"), levels = c("A", "B"))
f[3] <- "C" # Warning: NA introduced
🔍 Explanation:
- Since
"C"is not in the defined levels, R assignsNAand warns. - Always ensure new values match defined levels.
🧠 Why Use Factors?
- Statistical modeling (e.g.,
lm(),glm()) uses factors to treat categorical predictors properly. - Grouping: factors help group data in
tapply(),aggregate(), andggplot2. - Memory efficiency: storing as integers is more efficient than strings.
📌 Summary – Recap & Next Steps
Factors allow you to treat categorical data with defined structure and meaning in R. They’re ideal for modeling, classification, and grouping in data analysis.
🔍 Key Takeaways:
- Use
factor()to create categorical variables with levels. - Use
ordered = TRUEfor ordinal (ranked) categories. - Rename or reorder levels with
levels(). - Convert between types carefully with
as.character()oras.numeric(). - Use
table()andsummary()for frequency analysis.
⚙️ Real-World Relevance:
In survey analysis, demographic labeling, classification modeling, and data visualization—factors are essential for interpreting categorical data correctly.
❓ FAQs – Factors in R
❓ What is the difference between factor and character in R?
✅ Character is plain text; Factor adds levels and is optimized for categories.
❓ Can I convert a factor to numeric safely?
✅ First convert to character:
as.numeric(as.character(factor_var))
❓ Why does R sort levels alphabetically?
✅ By default, factor() arranges levels in alphabetical order unless manually specified.
❓ Can I use factors in ggplot2 for colored grouping?
✅ Yes! Factors are ideal for fill, color, and facet arguments in ggplot.
❓ How do I check if a variable is a factor?
✅ Use:
is.factor(x)
Share Now :
