Estimated reading: 3 minutes 50 views

🧪 R Chi-Square Test – Independence and Goodness-of-Fit with Examples

🧲 Introduction – What is the Chi-Square Test in R?

The Chi-Square (χ²) Test in R is used to test relationships between categorical variables or assess goodness-of-fit between observed and expected frequencies. It’s one of the most common tests in hypothesis testing for contingency tables.

🎯 In this guide, you’ll learn:

How to perform a Chi-Square test in R
Test independence in a contingency table
Perform a goodness-of-fit test
Interpret p-values and expected counts

📊 1. Chi-Square Test of Independence

Used to test if two categorical variables are related.

✅ Example Dataset

data <- matrix(c(20, 15, 30, 35), nrow = 2,
               dimnames = list(Gender = c("Male", "Female"),
                               Preference = c("Product A", "Product B")))
data

🧾 Output:

        Preference
Gender   Product A Product B
  Male        20        30
  Female      15        35

✅ Run Chi-Square Test

chisq_test <- chisq.test(data)
print(chisq_test)

🔍 Output Breakdown:

Chi-squared statistic
Degrees of freedom (df)
p-value (significance of independence)

📋 2. Interpreting Results

if (chisq_test$p.value < 0.05) {
  print("Reject Null Hypothesis: Variables are dependent")
} else {
  print("Fail to Reject Null Hypothesis: Variables are independent")
}

🧠 If p < 0.05, there’s a statistically significant relationship between the variables.

✅ 3. Check Expected Frequencies

chisq_test$expected

📌 Compares expected vs observed counts. All expected values should generally be ≥ 5 for valid chi-square assumptions.

🎯 4. Chi-Square Goodness-of-Fit Test

Used to compare observed counts against a theoretical distribution.

✅ Example: Rolling a Die

observed <- c(14, 10, 9, 11, 8, 8)
expected <- rep(10, 6)  # Fair die expectation

chisq.test(x = observed, p = rep(1/6, 6))

🔍 Explanation:

x: Observed frequencies
p: Probability of each outcome under the null hypothesis

🧠 Assumptions of the Chi-Square Test

Assumption	Details
Sample Size	Observed frequency ≥ 5 per cell
Data Type	Categorical (nominal or ordinal)
Independence	Observations must be independent
Fixed Marginals (Goodness-of-fit)	Totals are pre-determined

📌 Summary – Recap & Next Steps

The Chi-Square test is ideal for determining if distributions differ or categorical variables are related. R’s chisq.test() simplifies both independence and goodness-of-fit testing.

🔍 Key Takeaways:

Use chisq.test() for both types of tests
Interpret p-value to assess significance
Check expected counts to meet assumptions
Perfect for survey, demographic, and categorical studies

⚙️ Real-World Relevance:
Used in market research, clinical trials, demographics, A/B testing, and behavioral analysis.

❓ FAQs – Chi-Square Test in R

❓ What does a low p-value mean in Chi-Square test?
✅ It means the observed distribution significantly differs from the expected (reject the null hypothesis).

❓ Can Chi-Square be used for numerical data?
❌ No. It’s only for categorical (qualitative) variables.

❓ What if expected frequencies are < 5?
✅ Use Fisher’s Exact Test or combine categories.

❓ How do I extract the test statistic from chisq.test()?

chisq_test$statistic

❓ Can I run Chi-Square on a data frame?
✅ Yes. Use table() to create a contingency matrix first:

table(df$Gender, df$Preference)

« Previous

Share Now :