🧪 R Chi-Square Test – Independence and Goodness-of-Fit with Examples
🧲 Introduction – What is the Chi-Square Test in R?
The Chi-Square (χ²) Test in R is used to test relationships between categorical variables or assess goodness-of-fit between observed and expected frequencies. It’s one of the most common tests in hypothesis testing for contingency tables.
🎯 In this guide, you’ll learn:
- How to perform a Chi-Square test in R
- Test independence in a contingency table
- Perform a goodness-of-fit test
- Interpret p-values and expected counts
📊 1. Chi-Square Test of Independence
Used to test if two categorical variables are related.
✅ Example Dataset
data <- matrix(c(20, 15, 30, 35), nrow = 2,
dimnames = list(Gender = c("Male", "Female"),
Preference = c("Product A", "Product B")))
data
🧾 Output:
Preference
Gender Product A Product B
Male 20 30
Female 15 35
✅ Run Chi-Square Test
chisq_test <- chisq.test(data)
print(chisq_test)
🔍 Output Breakdown:
- Chi-squared statistic
- Degrees of freedom (df)
- p-value (significance of independence)
📋 2. Interpreting Results
if (chisq_test$p.value < 0.05) {
print("Reject Null Hypothesis: Variables are dependent")
} else {
print("Fail to Reject Null Hypothesis: Variables are independent")
}
🧠 If p < 0.05, there’s a statistically significant relationship between the variables.
✅ 3. Check Expected Frequencies
chisq_test$expected
📌 Compares expected vs observed counts. All expected values should generally be ≥ 5 for valid chi-square assumptions.
🎯 4. Chi-Square Goodness-of-Fit Test
Used to compare observed counts against a theoretical distribution.
✅ Example: Rolling a Die
observed <- c(14, 10, 9, 11, 8, 8)
expected <- rep(10, 6) # Fair die expectation
chisq.test(x = observed, p = rep(1/6, 6))
🔍 Explanation:
x: Observed frequenciesp: Probability of each outcome under the null hypothesis
🧠 Assumptions of the Chi-Square Test
| Assumption | Details |
|---|---|
| Sample Size | Observed frequency ≥ 5 per cell |
| Data Type | Categorical (nominal or ordinal) |
| Independence | Observations must be independent |
| Fixed Marginals (Goodness-of-fit) | Totals are pre-determined |
📌 Summary – Recap & Next Steps
The Chi-Square test is ideal for determining if distributions differ or categorical variables are related. R’s chisq.test() simplifies both independence and goodness-of-fit testing.
🔍 Key Takeaways:
- Use
chisq.test()for both types of tests - Interpret p-value to assess significance
- Check expected counts to meet assumptions
- Perfect for survey, demographic, and categorical studies
⚙️ Real-World Relevance:
Used in market research, clinical trials, demographics, A/B testing, and behavioral analysis.
❓ FAQs – Chi-Square Test in R
❓ What does a low p-value mean in Chi-Square test?
✅ It means the observed distribution significantly differs from the expected (reject the null hypothesis).
❓ Can Chi-Square be used for numerical data?
❌ No. It’s only for categorical (qualitative) variables.
❓ What if expected frequencies are < 5?
✅ Use Fisher’s Exact Test or combine categories.
❓ How do I extract the test statistic from chisq.test()?
chisq_test$statistic
❓ Can I run Chi-Square on a data frame?
✅ Yes. Use table() to create a contingency matrix first:
table(df$Gender, df$Preference)
Share Now :
