🔢 R Logistic & Poisson Regression – Model Binary and Count Data Easily
🧲 Introduction – Generalized Linear Models in R
Not all data fits into linear regression. When you’re predicting:
- Binary outcomes (yes/no, 0/1): use Logistic Regression
- Count outcomes (number of events): use Poisson Regression
Both are part of Generalized Linear Models (GLM) and are easily implemented in R using the glm() function.
🎯 In this guide, you’ll learn:
- How to perform Logistic and Poisson regression in R
- Use
glm()with proper family (binomial,poisson) - Interpret coefficients, odds ratios, and model fit
- Apply models to real datasets and make predictions
🔐 1. Logistic Regression in R (Binary Classification)
✅ Example: Predict vs binary outcome
We’ll simulate binary data:
set.seed(123)
df <- data.frame(
age = c(25, 30, 35, 40, 45, 50, 55),
income = c(40, 45, 50, 60, 65, 70, 80),
purchase = c(0, 0, 0, 1, 1, 1, 1)
)
model_log <- glm(purchase ~ age + income, family = binomial, data = df)
summary(model_log)
🔍 Key Outputs:
- Estimate: Coefficients on log-odds scale
- z value & Pr(>|z|): Indicates significance
- Null deviance vs Residual deviance: Model fit improvement
🔄 Convert Log-Odds to Odds Ratio
exp(coef(model_log))
🧾 Output:
(Intercept) age income
0.0012 1.42 1.12
🔍 Interpretation:
- A one-unit increase in age multiplies the odds of purchase by 1.42
- More age/income = higher likelihood of purchase
📈 Predict Probabilities
new <- data.frame(age = 38, income = 58)
predict(model_log, newdata = new, type = "response") # Gives probability
📊 2. Poisson Regression in R (Count Modeling)
✅ Simulate Count Data
set.seed(123)
df2 <- data.frame(
hours = c(1, 2, 3, 4, 5, 6, 7),
events = c(1, 2, 3, 6, 9, 10, 13)
)
model_pois <- glm(events ~ hours, family = poisson(link = "log"), data = df2)
summary(model_pois)
🔍 Explanation:
- Predicts count data
link = "log": Default for Poisson, log-linear model
🔄 Exponentiate Coefficients (Rate Ratios)
exp(coef(model_pois))
🧾 Output:
(Intercept) hours
0.56 1.41
🔍 Interpretation:
- Each additional hour multiplies event rate by 1.41
- Log-linear increase in expected count
📈 Predict Counts
new2 <- data.frame(hours = 5)
predict(model_pois, newdata = new2, type = "response")
📐 Model Comparison: Logistic vs Poisson
| Feature | Logistic Regression | Poisson Regression |
|---|---|---|
| Response Variable | Binary (0 or 1) | Count (0, 1, 2, …) |
| Family (GLM) | binomial | poisson |
| Output Interpretation | Odds ratio | Event rate / rate ratio |
| Use Cases | Classification | Frequency modeling |
📌 Summary – Recap & Next Steps
Both Logistic and Poisson Regression are essential for modeling non-continuous data. R’s glm() makes it easy to define the appropriate model based on the outcome type.
🔍 Key Takeaways:
- Use
glm(..., family = binomial)for binary outcomes - Use
glm(..., family = poisson)for count data - Use
exp(coef(...))to interpret in natural scale - Use
predict(..., type = "response")for readable predictions
⚙️ Real-World Relevance:
These models are widely used in healthcare (disease diagnosis), marketing (click prediction), insurance (claim count), and survey analysis.
❓ FAQs – Logistic & Poisson Regression in R
❓ What is the main difference between lm() and glm()?
✅ lm() assumes normally distributed errors (continuous), glm() allows flexible distributions like binomial, Poisson, etc.
❓ When should I use logistic regression?
✅ When your dependent variable is binary (e.g., success/failure, 0/1).
❓ How to interpret coefficients in logistic regression?
✅ Coefficients are in log-odds. Use exp() to convert to odds ratios.
❓ How to detect overdispersion in Poisson regression?
✅ If residual deviance >> degrees of freedom, consider quasi-Poisson or negative binomial models.
❓ Can I use multiple predictors in logistic or Poisson regression?
✅ Yes. Just extend the formula:
glm(y ~ x1 + x2 + x3, family = binomial, data = ...)
Share Now :
