R Data Structures
Estimated reading: 4 minutes 30 views

🏷️ R Factors – Handle Categorical Data with Levels in R (with Code Explanation)


🧲 Introduction – What Are Factors in R?

In R, a factor is a data structure used to store categorical data, such as gender, grades, or survey responses. Factors are stored as integer vectors with labels (called levels), making them more memory-efficient and suitable for statistical modeling compared to character vectors.

🎯 In this guide, you’ll learn:

  • How to create and inspect factors
  • How to modify and order levels
  • How to use factors inside data frames
  • Real-world use cases and detailed explanations of code

🧪 Creating a Factor in R

responses <- c("Yes", "No", "Yes", "Maybe", "No")
fact <- factor(responses)
print(fact)

🔍 Explanation:

  • responses is a character vector with repeated categories.
  • factor(responses) converts it to a factor where R identifies the unique categories: "Maybe", "No", and "Yes" (alphabetical order).
  • Internally, these are stored as integers: 1 = "Maybe", 2 = "No", 3 = "Yes".

🧾 Output:

[1] Yes   No    Yes   Maybe No   
Levels: Maybe No Yes

🔠 Creating Ordered Factors

grades <- c("C", "B", "A", "B", "C")
ordered_grades <- factor(grades, levels = c("C", "B", "A"), ordered = TRUE)

🔍 Explanation:

  • We manually set levels = c("C", "B", "A") to define a ranking.
  • ordered = TRUE tells R that "A" is the highest and "C" is the lowest.
  • This enables logical comparisons like < or >.
ordered_grades[1] < ordered_grades[2]  # TRUE

🔁 Modifying Factor Levels

levels(fact) <- c("May", "Nope", "Yes")
print(fact)

🔍 Explanation:

  • We’re renaming the levels from "Maybe", "No", "Yes" to "May", "Nope", and "Yes".
  • The order must match the original levels exactly.

🔄 Converting Between Types

as.character(fact)   # Converts factor to character
as.numeric(fact)     # Shows underlying integer codes

🔍 Explanation:

  • as.character() retrieves the category labels.
  • as.numeric() gives internal encoding (e.g., “May” = 1, “Nope” = 2, etc.)

📊 Using Factors in Data Frames

df <- data.frame(Name = c("Tom", "Jane"), Gender = factor(c("M", "F")))
str(df)

🔍 Explanation:

  • The Gender column is explicitly made a factor.
  • str() shows the structure—Gender is stored with levels, not just text.

📈 Frequency Analysis with table() and summary()

table(fact)       # Frequency count of each level
summary(fact)     # Summarized frequency with level names

🔍 Explanation:

  • table() shows how many times each category occurs.
  • summary() is similar but prints it in a formatted report-like way.

⚠️ Handling Invalid Level Assignments

f <- factor(c("A", "B"), levels = c("A", "B"))
f[3] <- "C"    # Warning: NA introduced

🔍 Explanation:

  • Since "C" is not in the defined levels, R assigns NA and warns.
  • Always ensure new values match defined levels.

🧠 Why Use Factors?

  • Statistical modeling (e.g., lm(), glm()) uses factors to treat categorical predictors properly.
  • Grouping: factors help group data in tapply(), aggregate(), and ggplot2.
  • Memory efficiency: storing as integers is more efficient than strings.

📌 Summary – Recap & Next Steps

Factors allow you to treat categorical data with defined structure and meaning in R. They’re ideal for modeling, classification, and grouping in data analysis.

🔍 Key Takeaways:

  • Use factor() to create categorical variables with levels.
  • Use ordered = TRUE for ordinal (ranked) categories.
  • Rename or reorder levels with levels().
  • Convert between types carefully with as.character() or as.numeric().
  • Use table() and summary() for frequency analysis.

⚙️ Real-World Relevance:
In survey analysis, demographic labeling, classification modeling, and data visualization—factors are essential for interpreting categorical data correctly.


❓ FAQs – Factors in R

❓ What is the difference between factor and character in R?
Character is plain text; Factor adds levels and is optimized for categories.

❓ Can I convert a factor to numeric safely?
✅ First convert to character:

as.numeric(as.character(factor_var))

❓ Why does R sort levels alphabetically?
✅ By default, factor() arranges levels in alphabetical order unless manually specified.

❓ Can I use factors in ggplot2 for colored grouping?
✅ Yes! Factors are ideal for fill, color, and facet arguments in ggplot.

❓ How do I check if a variable is a factor?
✅ Use:

is.factor(x)

Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

R – Factors

Or Copy Link

CONTENTS
Scroll to Top