R Decision Tree & Random Forest – Machine Learning Models Explained
Introduction – Tree-Based Modeling in R
Decision Trees and Random Forests are two of the most widely used machine learning algorithms in R for classification and regression tasks. These models are intuitive, visual, and perform well with structured/tabular data.
- Decision Tree: A flowchart-like structure for decision-making
- Random Forest: An ensemble of decision trees to improve accuracy and reduce overfitting
In this guide, you’ll learn:
- How to build and visualize decision trees in R
- Train a Random Forest model and evaluate its performance
- Understand splitting rules, feature importance, and predictions
1. Decision Tree in R
Load Required Packages
library(rpart)
library(rpart.plot)
Example: Classify Iris Species
data(iris)
tree_model <- rpart(Species ~ ., data = iris, method = "class")
rpart.plot(tree_model, type = 3, extra = 104, fallen.leaves = TRUE)
Explanation:
rpart()builds a decision treemethod = "class"for classificationrpart.plot()visualizes the tree with nodes, classes, and probabilities
Predict and Evaluate
pred <- predict(tree_model, iris, type = "class")
table(Predicted = pred, Actual = iris$Species)
Outputs a confusion matrix comparing actual vs predicted labels.
2. Random Forest in R
Load Random Forest Library
library(randomForest)
Train Model on Iris Data
set.seed(123)
rf_model <- randomForest(Species ~ ., data = iris, ntree = 100)
print(rf_model)
Explanation:
randomForest()builds the ensemblentree = 100uses 100 trees in the forest- Returns accuracy, confusion matrix, and feature importance
Feature Importance Plot
importance(rf_model)
varImpPlot(rf_model)
Shows which features contribute most to prediction (e.g., Petal.Width, Petal.Length)
3. Predict with Random Forest
new_data <- iris[1:5, -5] # Remove target column
predict(rf_model, new_data)
4. Compare Decision Tree vs Random Forest
| Feature | Decision Tree | Random Forest |
|---|---|---|
| Model Type | Single Tree | Ensemble of Trees |
| Overfitting Risk | High | Low |
| Accuracy | Moderate | High |
| Interpretability | Very High (visual) | Moderate |
| Feature Importance | Basic | More Reliable |
Summary – Recap & Next Steps
Decision Trees and Random Forests in R are easy to implement and powerful for both classification and regression problems. Trees are interpretable, and forests provide robust performance.
Key Takeaways:
- Use
rpart()andrpart.plot()for decision trees - Use
randomForest()for ensemble learning - Visualize trees and variable importance
- Evaluate model with confusion matrices and accuracy
Real-World Relevance:
Used in credit scoring, medical diagnosis, customer segmentation, churn prediction, and fraud detection.
FAQs – Decision Trees & Random Forests in R
When should I use a Decision Tree over a Random Forest?
Use a Decision Tree for interpretability and quick insight, and Random Forest for accuracy and robustness.
How can I prevent overfitting in decision trees?
Use pruning (cp in rpart.control) or switch to Random Forest which handles it automatically.
What does ntree mean in Random Forest?
Number of decision trees to build. More trees = better performance (to a point).
How to tune Random Forest parameters?
Use tuneRF() or caret::train() for automatic parameter tuning (like mtry, ntree).
Can Random Forest handle missing values?
Yes, the randomForest package in R can handle missing data internally.
Share Now :
