R Linear & Multiple Regression – Predict and Analyze Relationships with Code Examples
Introduction – Modeling Relationships in R
Regression analysis is a cornerstone of statistical modeling. It helps you understand and predict the relationship between variables.
- Use linear regression for one predictor variable
- Use multiple regression for two or more predictors
R provides the lm() function, making it easy to fit, summarize, and visualize regression models.
In this guide, you’ll learn:
- How to perform linear and multiple regression in R
- Interpret regression output: coefficients, R-squared, residuals
- Plot regression lines with
plot()andggplot2 - Use real datasets like
mtcarsfor demonstrations
1. Simple Linear Regression in R
Example: Predict MPG based on Weight
data(mtcars)
model <- lm(mpg ~ wt, data = mtcars)
summary(model)
Output Highlights:
Coefficients:
(Intercept) wt
37.285 -5.344
Multiple R-squared: 0.753
Interpretation:
- Intercept (37.285): Expected MPG when weight = 0
- Slope (-5.344): Each unit increase in
wtdecreases MPG by ~5.3 - R-squared (0.753): Model explains 75.3% of the variation in MPG
2. Visualize Linear Regression Line
plot(mtcars$wt, mtcars$mpg, main = "MPG vs Weight")
abline(model, col = "blue", lwd = 2)
With ggplot2:
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_smooth(method = "lm", col = "red") +
labs(title = "Linear Regression: MPG ~ Weight")
3. Multiple Linear Regression
Example: Predict MPG using multiple predictors
multi_model <- lm(mpg ~ wt + hp + cyl, data = mtcars)
summary(multi_model)
Output Highlights:
Coefficients:
(Intercept) wt hp cyl
38.752 -3.167 -0.018 -1.227
Multiple R-squared: 0.843
Interpretation:
- More variables → potentially better fit
wt,hp,cylall negatively affect MPGR² = 0.843: Explains 84.3% of MPG variance
4. Compare Models (ANOVA)
anova(model, multi_model)
Use case:
- Compare nested models (e.g., simple vs multiple regression)
- See if adding variables significantly improves the model
5. Predict Values Using Model
newdata <- data.frame(wt = 3)
predict(model, newdata)
For multiple regression:
predict(multi_model, data.frame(wt = 3, hp = 150, cyl = 6))
6. Check Residuals and Diagnostics
par(mfrow = c(2, 2))
plot(model)
Produces:
- Residuals vs Fitted
- Normal Q-Q
- Scale-Location
- Residuals vs Leverage
These help you validate assumptions of linear regression.
Regression Model Summary Table
| Term | Description |
|---|---|
| Intercept | Value of Y when all predictors = 0 |
| Coefficient | Impact of 1-unit change in X on Y |
| R-squared | % of variance explained by model |
| p-value | Significance of each predictor |
| Residuals | Differences between predicted and actual Y |
Summary – Recap & Next Steps
Regression in R is a powerful and easy-to-use tool for modeling relationships. Whether it’s a single predictor or multiple, lm() gives you everything needed to fit, evaluate, and visualize regression models.
Key Takeaways:
- Use
lm()for linear and multiple regression - Interpret slope, intercept, and R² carefully
- Visualize with
abline()orgeom_smooth() - Validate with residual plots and ANOVA
- Predict outcomes using
predict()
Real-World Relevance:
Used in finance, marketing, engineering, machine learning, econometrics, and healthcare analytics for prediction, causality, and insight generation.
FAQs – Linear & Multiple Regression in R
How do I perform linear regression in R?
Use:
lm(response ~ predictor, data = dataset)
What does R-squared mean?
It measures the proportion of variance in the response variable explained by the predictors.
How to predict new values using regression?
Use predict() with a new data frame:
predict(model, newdata)
How to check if a model is statistically significant?
Look at the p-values of coefficients and overall F-statistic in summary().
How to plot regression line in R?
Use:
abline(model) # base R
geom_smooth(method = "lm") # ggplot2
Share Now :
