R Chi-Square Test – Independence and Goodness-of-Fit with Examples
Introduction – What is the Chi-Square Test in R?
The Chi-Square (χ²) Test in R is used to test relationships between categorical variables or assess goodness-of-fit between observed and expected frequencies. It’s one of the most common tests in hypothesis testing for contingency tables.
In this guide, you’ll learn:
- How to perform a Chi-Square test in R
- Test independence in a contingency table
- Perform a goodness-of-fit test
- Interpret p-values and expected counts
1. Chi-Square Test of Independence
Used to test if two categorical variables are related.
Example Dataset
data <- matrix(c(20, 15, 30, 35), nrow = 2,
dimnames = list(Gender = c("Male", "Female"),
Preference = c("Product A", "Product B")))
data
Output:
Preference
Gender Product A Product B
Male 20 30
Female 15 35
Run Chi-Square Test
chisq_test <- chisq.test(data)
print(chisq_test)
Output Breakdown:
- Chi-squared statistic
- Degrees of freedom (df)
- p-value (significance of independence)
2. Interpreting Results
if (chisq_test$p.value < 0.05) {
print("Reject Null Hypothesis: Variables are dependent")
} else {
print("Fail to Reject Null Hypothesis: Variables are independent")
}
If p < 0.05, there’s a statistically significant relationship between the variables.
3. Check Expected Frequencies
chisq_test$expected
Compares expected vs observed counts. All expected values should generally be ≥ 5 for valid chi-square assumptions.
4. Chi-Square Goodness-of-Fit Test
Used to compare observed counts against a theoretical distribution.
Example: Rolling a Die
observed <- c(14, 10, 9, 11, 8, 8)
expected <- rep(10, 6) # Fair die expectation
chisq.test(x = observed, p = rep(1/6, 6))
Explanation:
x: Observed frequenciesp: Probability of each outcome under the null hypothesis
Assumptions of the Chi-Square Test
| Assumption | Details |
|---|---|
| Sample Size | Observed frequency ≥ 5 per cell |
| Data Type | Categorical (nominal or ordinal) |
| Independence | Observations must be independent |
| Fixed Marginals (Goodness-of-fit) | Totals are pre-determined |
Summary – Recap & Next Steps
The Chi-Square test is ideal for determining if distributions differ or categorical variables are related. R’s chisq.test() simplifies both independence and goodness-of-fit testing.
Key Takeaways:
- Use
chisq.test()for both types of tests - Interpret p-value to assess significance
- Check expected counts to meet assumptions
- Perfect for survey, demographic, and categorical studies
Real-World Relevance:
Used in market research, clinical trials, demographics, A/B testing, and behavioral analysis.
FAQs – Chi-Square Test in R
What does a low p-value mean in Chi-Square test?
It means the observed distribution significantly differs from the expected (reject the null hypothesis).
Can Chi-Square be used for numerical data?
No. It’s only for categorical (qualitative) variables.
What if expected frequencies are < 5?
Use Fisher’s Exact Test or combine categories.
How do I extract the test statistic from chisq.test()?
chisq_test$statistic
Can I run Chi-Square on a data frame?
Yes. Use table() to create a contingency matrix first:
table(df$Gender, df$Preference)
Share Now :
