🌳 R Decision Tree & Random Forest – Machine Learning Models Explained
🧲 Introduction – Tree-Based Modeling in R
Decision Trees and Random Forests are two of the most widely used machine learning algorithms in R for classification and regression tasks. These models are intuitive, visual, and perform well with structured/tabular data.
- Decision Tree: A flowchart-like structure for decision-making
- Random Forest: An ensemble of decision trees to improve accuracy and reduce overfitting
🎯 In this guide, you’ll learn:
- How to build and visualize decision trees in R
- Train a Random Forest model and evaluate its performance
- Understand splitting rules, feature importance, and predictions
🌳 1. Decision Tree in R
✅ Load Required Packages
library(rpart)
library(rpart.plot)
✅ Example: Classify Iris Species
data(iris)
tree_model <- rpart(Species ~ ., data = iris, method = "class")
rpart.plot(tree_model, type = 3, extra = 104, fallen.leaves = TRUE)
🔍 Explanation:
rpart()builds a decision treemethod = "class"for classificationrpart.plot()visualizes the tree with nodes, classes, and probabilities
✅ Predict and Evaluate
pred <- predict(tree_model, iris, type = "class")
table(Predicted = pred, Actual = iris$Species)
📌 Outputs a confusion matrix comparing actual vs predicted labels.
🌲 2. Random Forest in R
✅ Load Random Forest Library
library(randomForest)
✅ Train Model on Iris Data
set.seed(123)
rf_model <- randomForest(Species ~ ., data = iris, ntree = 100)
print(rf_model)
🔍 Explanation:
randomForest()builds the ensemblentree = 100uses 100 trees in the forest- Returns accuracy, confusion matrix, and feature importance
✅ Feature Importance Plot
importance(rf_model)
varImpPlot(rf_model)
📊 Shows which features contribute most to prediction (e.g., Petal.Width, Petal.Length)
📈 3. Predict with Random Forest
new_data <- iris[1:5, -5] # Remove target column
predict(rf_model, new_data)
🔍 4. Compare Decision Tree vs Random Forest
| Feature | Decision Tree | Random Forest |
|---|---|---|
| Model Type | Single Tree | Ensemble of Trees |
| Overfitting Risk | High | Low |
| Accuracy | Moderate | High |
| Interpretability | Very High (visual) | Moderate |
| Feature Importance | Basic | More Reliable |
📌 Summary – Recap & Next Steps
Decision Trees and Random Forests in R are easy to implement and powerful for both classification and regression problems. Trees are interpretable, and forests provide robust performance.
🔍 Key Takeaways:
- Use
rpart()andrpart.plot()for decision trees - Use
randomForest()for ensemble learning - Visualize trees and variable importance
- Evaluate model with confusion matrices and accuracy
⚙️ Real-World Relevance:
Used in credit scoring, medical diagnosis, customer segmentation, churn prediction, and fraud detection.
❓ FAQs – Decision Trees & Random Forests in R
❓ When should I use a Decision Tree over a Random Forest?
✅ Use a Decision Tree for interpretability and quick insight, and Random Forest for accuracy and robustness.
❓ How can I prevent overfitting in decision trees?
✅ Use pruning (cp in rpart.control) or switch to Random Forest which handles it automatically.
❓ What does ntree mean in Random Forest?
✅ Number of decision trees to build. More trees = better performance (to a point).
❓ How to tune Random Forest parameters?
✅ Use tuneRF() or caret::train() for automatic parameter tuning (like mtry, ntree).
❓ Can Random Forest handle missing values?
✅ Yes, the randomForest package in R can handle missing data internally.
Share Now :
