R Data Structures
Estimated reading: 4 minutes 278 views

R Factors – Handle Categorical Data with Levels in R (with Code Explanation)


Introduction – What Are Factors in R?

In R, a factor is a data structure used to store categorical data, such as gender, grades, or survey responses. Factors are stored as integer vectors with labels (called levels), making them more memory-efficient and suitable for statistical modeling compared to character vectors.

In this guide, you’ll learn:

  • How to create and inspect factors
  • How to modify and order levels
  • How to use factors inside data frames
  • Real-world use cases and detailed explanations of code

Creating a Factor in R

responses <- c("Yes", "No", "Yes", "Maybe", "No")
fact <- factor(responses)
print(fact)

Explanation:

  • responses is a character vector with repeated categories.
  • factor(responses) converts it to a factor where R identifies the unique categories: "Maybe", "No", and "Yes" (alphabetical order).
  • Internally, these are stored as integers: 1 = "Maybe", 2 = "No", 3 = "Yes".

Output:

[1] Yes   No    Yes   Maybe No   
Levels: Maybe No Yes

Creating Ordered Factors

grades <- c("C", "B", "A", "B", "C")
ordered_grades <- factor(grades, levels = c("C", "B", "A"), ordered = TRUE)

Explanation:

  • We manually set levels = c("C", "B", "A") to define a ranking.
  • ordered = TRUE tells R that "A" is the highest and "C" is the lowest.
  • This enables logical comparisons like < or >.
ordered_grades[1] < ordered_grades[2]  # TRUE

Modifying Factor Levels

levels(fact) <- c("May", "Nope", "Yes")
print(fact)

Explanation:

  • We’re renaming the levels from "Maybe", "No", "Yes" to "May", "Nope", and "Yes".
  • The order must match the original levels exactly.

Converting Between Types

as.character(fact)   # Converts factor to character
as.numeric(fact)     # Shows underlying integer codes

Explanation:

  • as.character() retrieves the category labels.
  • as.numeric() gives internal encoding (e.g., “May” = 1, “Nope” = 2, etc.)

Using Factors in Data Frames

df <- data.frame(Name = c("Tom", "Jane"), Gender = factor(c("M", "F")))
str(df)

Explanation:

  • The Gender column is explicitly made a factor.
  • str() shows the structure—Gender is stored with levels, not just text.

Frequency Analysis with table() and summary()

table(fact)       # Frequency count of each level
summary(fact)     # Summarized frequency with level names

Explanation:

  • table() shows how many times each category occurs.
  • summary() is similar but prints it in a formatted report-like way.

Handling Invalid Level Assignments

f <- factor(c("A", "B"), levels = c("A", "B"))
f[3] <- "C"    # Warning: NA introduced

Explanation:

  • Since "C" is not in the defined levels, R assigns NA and warns.
  • Always ensure new values match defined levels.

Why Use Factors?

  • Statistical modeling (e.g., lm(), glm()) uses factors to treat categorical predictors properly.
  • Grouping: factors help group data in tapply(), aggregate(), and ggplot2.
  • Memory efficiency: storing as integers is more efficient than strings.

Summary – Recap & Next Steps

Factors allow you to treat categorical data with defined structure and meaning in R. They’re ideal for modeling, classification, and grouping in data analysis.

Key Takeaways:

  • Use factor() to create categorical variables with levels.
  • Use ordered = TRUE for ordinal (ranked) categories.
  • Rename or reorder levels with levels().
  • Convert between types carefully with as.character() or as.numeric().
  • Use table() and summary() for frequency analysis.

Real-World Relevance:
In survey analysis, demographic labeling, classification modeling, and data visualization—factors are essential for interpreting categorical data correctly.


FAQs – Factors in R

What is the difference between factor and character in R?
Character is plain text; Factor adds levels and is optimized for categories.

Can I convert a factor to numeric safely?
First convert to character:

as.numeric(as.character(factor_var))

Why does R sort levels alphabetically?
By default, factor() arranges levels in alphabetical order unless manually specified.

Can I use factors in ggplot2 for colored grouping?
Yes! Factors are ideal for fill, color, and facet arguments in ggplot.

How do I check if a variable is a factor?
Use:

is.factor(x)

Share Now :
Share

R – Factors

Or Copy Link

CONTENTS
Scroll to Top