Statistical Analysis with R
Estimated reading: 3 minutes 282 views

R Chi-Square Test – Independence and Goodness-of-Fit with Examples


Introduction – What is the Chi-Square Test in R?

The Chi-Square (χ²) Test in R is used to test relationships between categorical variables or assess goodness-of-fit between observed and expected frequencies. It’s one of the most common tests in hypothesis testing for contingency tables.

In this guide, you’ll learn:

  • How to perform a Chi-Square test in R
  • Test independence in a contingency table
  • Perform a goodness-of-fit test
  • Interpret p-values and expected counts

1. Chi-Square Test of Independence

Used to test if two categorical variables are related.

Example Dataset

data <- matrix(c(20, 15, 30, 35), nrow = 2,
               dimnames = list(Gender = c("Male", "Female"),
                               Preference = c("Product A", "Product B")))
data

Output:

        Preference
Gender   Product A Product B
  Male        20        30
  Female      15        35

Run Chi-Square Test

chisq_test <- chisq.test(data)
print(chisq_test)

Output Breakdown:

  • Chi-squared statistic
  • Degrees of freedom (df)
  • p-value (significance of independence)

2. Interpreting Results

if (chisq_test$p.value < 0.05) {
  print("Reject Null Hypothesis: Variables are dependent")
} else {
  print("Fail to Reject Null Hypothesis: Variables are independent")
}

If p < 0.05, there’s a statistically significant relationship between the variables.


3. Check Expected Frequencies

chisq_test$expected

Compares expected vs observed counts. All expected values should generally be ≥ 5 for valid chi-square assumptions.


4. Chi-Square Goodness-of-Fit Test

Used to compare observed counts against a theoretical distribution.

Example: Rolling a Die

observed <- c(14, 10, 9, 11, 8, 8)
expected <- rep(10, 6)  # Fair die expectation

chisq.test(x = observed, p = rep(1/6, 6))

Explanation:

  • x: Observed frequencies
  • p: Probability of each outcome under the null hypothesis

Assumptions of the Chi-Square Test

AssumptionDetails
Sample SizeObserved frequency ≥ 5 per cell
Data TypeCategorical (nominal or ordinal)
IndependenceObservations must be independent
Fixed Marginals (Goodness-of-fit)Totals are pre-determined

Summary – Recap & Next Steps

The Chi-Square test is ideal for determining if distributions differ or categorical variables are related. R’s chisq.test() simplifies both independence and goodness-of-fit testing.

Key Takeaways:

  • Use chisq.test() for both types of tests
  • Interpret p-value to assess significance
  • Check expected counts to meet assumptions
  • Perfect for survey, demographic, and categorical studies

Real-World Relevance:
Used in market research, clinical trials, demographics, A/B testing, and behavioral analysis.


FAQs – Chi-Square Test in R

What does a low p-value mean in Chi-Square test?
It means the observed distribution significantly differs from the expected (reject the null hypothesis).

Can Chi-Square be used for numerical data?
No. It’s only for categorical (qualitative) variables.

What if expected frequencies are < 5?
Use Fisher’s Exact Test or combine categories.

How do I extract the test statistic from chisq.test()?

chisq_test$statistic

Can I run Chi-Square on a data frame?
Yes. Use table() to create a contingency matrix first:

table(df$Gender, df$Preference)

Share Now :
Share

R – Chi-Square Test

Or Copy Link

CONTENTS
Scroll to Top