📈 R Linear & Multiple Regression – Predict and Analyze Relationships with Code Examples
🧲 Introduction – Modeling Relationships in R
Regression analysis is a cornerstone of statistical modeling. It helps you understand and predict the relationship between variables.
- Use linear regression for one predictor variable
- Use multiple regression for two or more predictors
R provides the lm() function, making it easy to fit, summarize, and visualize regression models.
🎯 In this guide, you’ll learn:
- How to perform linear and multiple regression in R
- Interpret regression output: coefficients, R-squared, residuals
- Plot regression lines with
plot()andggplot2 - Use real datasets like
mtcarsfor demonstrations
🔹 1. Simple Linear Regression in R
✅ Example: Predict MPG based on Weight
data(mtcars)
model <- lm(mpg ~ wt, data = mtcars)
summary(model)
🔍 Output Highlights:
Coefficients:
(Intercept) wt
37.285 -5.344
Multiple R-squared: 0.753
🔍 Interpretation:
- Intercept (37.285): Expected MPG when weight = 0
- Slope (-5.344): Each unit increase in
wtdecreases MPG by ~5.3 - R-squared (0.753): Model explains 75.3% of the variation in MPG
📊 2. Visualize Linear Regression Line
plot(mtcars$wt, mtcars$mpg, main = "MPG vs Weight")
abline(model, col = "blue", lwd = 2)
✅ With ggplot2:
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_smooth(method = "lm", col = "red") +
labs(title = "Linear Regression: MPG ~ Weight")
🔹 3. Multiple Linear Regression
✅ Example: Predict MPG using multiple predictors
multi_model <- lm(mpg ~ wt + hp + cyl, data = mtcars)
summary(multi_model)
🔍 Output Highlights:
Coefficients:
(Intercept) wt hp cyl
38.752 -3.167 -0.018 -1.227
Multiple R-squared: 0.843
🔍 Interpretation:
- More variables → potentially better fit
wt,hp,cylall negatively affect MPGR² = 0.843: Explains 84.3% of MPG variance
🧠 4. Compare Models (ANOVA)
anova(model, multi_model)
🔍 Use case:
- Compare nested models (e.g., simple vs multiple regression)
- See if adding variables significantly improves the model
🧮 5. Predict Values Using Model
newdata <- data.frame(wt = 3)
predict(model, newdata)
✅ For multiple regression:
predict(multi_model, data.frame(wt = 3, hp = 150, cyl = 6))
📋 6. Check Residuals and Diagnostics
par(mfrow = c(2, 2))
plot(model)
🔍 Produces:
- Residuals vs Fitted
- Normal Q-Q
- Scale-Location
- Residuals vs Leverage
These help you validate assumptions of linear regression.
📐 Regression Model Summary Table
| Term | Description |
|---|---|
| Intercept | Value of Y when all predictors = 0 |
| Coefficient | Impact of 1-unit change in X on Y |
| R-squared | % of variance explained by model |
| p-value | Significance of each predictor |
| Residuals | Differences between predicted and actual Y |
📌 Summary – Recap & Next Steps
Regression in R is a powerful and easy-to-use tool for modeling relationships. Whether it’s a single predictor or multiple, lm() gives you everything needed to fit, evaluate, and visualize regression models.
🔍 Key Takeaways:
- Use
lm()for linear and multiple regression - Interpret slope, intercept, and R² carefully
- Visualize with
abline()orgeom_smooth() - Validate with residual plots and ANOVA
- Predict outcomes using
predict()
⚙️ Real-World Relevance:
Used in finance, marketing, engineering, machine learning, econometrics, and healthcare analytics for prediction, causality, and insight generation.
❓ FAQs – Linear & Multiple Regression in R
❓ How do I perform linear regression in R?
✅ Use:
lm(response ~ predictor, data = dataset)
❓ What does R-squared mean?
✅ It measures the proportion of variance in the response variable explained by the predictors.
❓ How to predict new values using regression?
✅ Use predict() with a new data frame:
predict(model, newdata)
❓ How to check if a model is statistically significant?
✅ Look at the p-values of coefficients and overall F-statistic in summary().
❓ How to plot regression line in R?
✅ Use:
abline(model) # base R
geom_smooth(method = "lm") # ggplot2
Share Now :
