Statistical Analysis with R
Estimated reading: 3 minutes 50 views

🧪 R Chi-Square Test – Independence and Goodness-of-Fit with Examples


🧲 Introduction – What is the Chi-Square Test in R?

The Chi-Square (χ²) Test in R is used to test relationships between categorical variables or assess goodness-of-fit between observed and expected frequencies. It’s one of the most common tests in hypothesis testing for contingency tables.

🎯 In this guide, you’ll learn:

  • How to perform a Chi-Square test in R
  • Test independence in a contingency table
  • Perform a goodness-of-fit test
  • Interpret p-values and expected counts

📊 1. Chi-Square Test of Independence

Used to test if two categorical variables are related.

✅ Example Dataset

data <- matrix(c(20, 15, 30, 35), nrow = 2,
               dimnames = list(Gender = c("Male", "Female"),
                               Preference = c("Product A", "Product B")))
data

🧾 Output:

        Preference
Gender   Product A Product B
  Male        20        30
  Female      15        35

✅ Run Chi-Square Test

chisq_test <- chisq.test(data)
print(chisq_test)

🔍 Output Breakdown:

  • Chi-squared statistic
  • Degrees of freedom (df)
  • p-value (significance of independence)

📋 2. Interpreting Results

if (chisq_test$p.value < 0.05) {
  print("Reject Null Hypothesis: Variables are dependent")
} else {
  print("Fail to Reject Null Hypothesis: Variables are independent")
}

🧠 If p < 0.05, there’s a statistically significant relationship between the variables.


✅ 3. Check Expected Frequencies

chisq_test$expected

📌 Compares expected vs observed counts. All expected values should generally be ≥ 5 for valid chi-square assumptions.


🎯 4. Chi-Square Goodness-of-Fit Test

Used to compare observed counts against a theoretical distribution.

✅ Example: Rolling a Die

observed <- c(14, 10, 9, 11, 8, 8)
expected <- rep(10, 6)  # Fair die expectation

chisq.test(x = observed, p = rep(1/6, 6))

🔍 Explanation:

  • x: Observed frequencies
  • p: Probability of each outcome under the null hypothesis

🧠 Assumptions of the Chi-Square Test

AssumptionDetails
Sample SizeObserved frequency ≥ 5 per cell
Data TypeCategorical (nominal or ordinal)
IndependenceObservations must be independent
Fixed Marginals (Goodness-of-fit)Totals are pre-determined

📌 Summary – Recap & Next Steps

The Chi-Square test is ideal for determining if distributions differ or categorical variables are related. R’s chisq.test() simplifies both independence and goodness-of-fit testing.

🔍 Key Takeaways:

  • Use chisq.test() for both types of tests
  • Interpret p-value to assess significance
  • Check expected counts to meet assumptions
  • Perfect for survey, demographic, and categorical studies

⚙️ Real-World Relevance:
Used in market research, clinical trials, demographics, A/B testing, and behavioral analysis.


❓ FAQs – Chi-Square Test in R

❓ What does a low p-value mean in Chi-Square test?
✅ It means the observed distribution significantly differs from the expected (reject the null hypothesis).

❓ Can Chi-Square be used for numerical data?
❌ No. It’s only for categorical (qualitative) variables.

❓ What if expected frequencies are < 5?
✅ Use Fisher’s Exact Test or combine categories.

❓ How do I extract the test statistic from chisq.test()?

chisq_test$statistic

❓ Can I run Chi-Square on a data frame?
✅ Yes. Use table() to create a contingency matrix first:

table(df$Gender, df$Preference)

Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

R – Chi-Square Test

Or Copy Link

CONTENTS
Scroll to Top