🔢 R Logistic & Poisson Regression – Model Binary and Count Data Easily
🧲 Introduction – Generalized Linear Models in R
Not all data fits into linear regression. When you’re predicting:
- Binary outcomes (yes/no, 0/1): use Logistic Regression
- Count outcomes (number of events): use Poisson Regression
Both are part of Generalized Linear Models (GLM) and are easily implemented in R using the glm()
function.
🎯 In this guide, you’ll learn:
- How to perform Logistic and Poisson regression in R
- Use
glm()
with proper family (binomial
,poisson
) - Interpret coefficients, odds ratios, and model fit
- Apply models to real datasets and make predictions
🔐 1. Logistic Regression in R (Binary Classification)
✅ Example: Predict vs binary outcome
We’ll simulate binary data:
set.seed(123)
df <- data.frame(
age = c(25, 30, 35, 40, 45, 50, 55),
income = c(40, 45, 50, 60, 65, 70, 80),
purchase = c(0, 0, 0, 1, 1, 1, 1)
)
model_log <- glm(purchase ~ age + income, family = binomial, data = df)
summary(model_log)
🔍 Key Outputs:
- Estimate: Coefficients on log-odds scale
- z value & Pr(>|z|): Indicates significance
- Null deviance vs Residual deviance: Model fit improvement
🔄 Convert Log-Odds to Odds Ratio
exp(coef(model_log))
🧾 Output:
(Intercept) age income
0.0012 1.42 1.12
🔍 Interpretation:
- A one-unit increase in age multiplies the odds of purchase by 1.42
- More age/income = higher likelihood of purchase
📈 Predict Probabilities
new <- data.frame(age = 38, income = 58)
predict(model_log, newdata = new, type = "response") # Gives probability
📊 2. Poisson Regression in R (Count Modeling)
✅ Simulate Count Data
set.seed(123)
df2 <- data.frame(
hours = c(1, 2, 3, 4, 5, 6, 7),
events = c(1, 2, 3, 6, 9, 10, 13)
)
model_pois <- glm(events ~ hours, family = poisson(link = "log"), data = df2)
summary(model_pois)
🔍 Explanation:
- Predicts count data
link = "log"
: Default for Poisson, log-linear model
🔄 Exponentiate Coefficients (Rate Ratios)
exp(coef(model_pois))
🧾 Output:
(Intercept) hours
0.56 1.41
🔍 Interpretation:
- Each additional hour multiplies event rate by 1.41
- Log-linear increase in expected count
📈 Predict Counts
new2 <- data.frame(hours = 5)
predict(model_pois, newdata = new2, type = "response")
📐 Model Comparison: Logistic vs Poisson
Feature | Logistic Regression | Poisson Regression |
---|---|---|
Response Variable | Binary (0 or 1) | Count (0, 1, 2, …) |
Family (GLM) | binomial | poisson |
Output Interpretation | Odds ratio | Event rate / rate ratio |
Use Cases | Classification | Frequency modeling |
📌 Summary – Recap & Next Steps
Both Logistic and Poisson Regression are essential for modeling non-continuous data. R’s glm()
makes it easy to define the appropriate model based on the outcome type.
🔍 Key Takeaways:
- Use
glm(..., family = binomial)
for binary outcomes - Use
glm(..., family = poisson)
for count data - Use
exp(coef(...))
to interpret in natural scale - Use
predict(..., type = "response")
for readable predictions
⚙️ Real-World Relevance:
These models are widely used in healthcare (disease diagnosis), marketing (click prediction), insurance (claim count), and survey analysis.
❓ FAQs – Logistic & Poisson Regression in R
❓ What is the main difference between lm()
and glm()
?
✅ lm()
assumes normally distributed errors (continuous), glm()
allows flexible distributions like binomial, Poisson, etc.
❓ When should I use logistic regression?
✅ When your dependent variable is binary (e.g., success/failure, 0/1).
❓ How to interpret coefficients in logistic regression?
✅ Coefficients are in log-odds. Use exp()
to convert to odds ratios.
❓ How to detect overdispersion in Poisson regression?
✅ If residual deviance >> degrees of freedom, consider quasi-Poisson or negative binomial models.
❓ Can I use multiple predictors in logistic or Poisson regression?
✅ Yes. Just extend the formula:
glm(y ~ x1 + x2 + x3, family = binomial, data = ...)
Share Now :