Statistical Analysis with R
Estimated reading: 3 minutes 41 views

🌳 R Decision Tree & Random Forest – Machine Learning Models Explained


🧲 Introduction – Tree-Based Modeling in R

Decision Trees and Random Forests are two of the most widely used machine learning algorithms in R for classification and regression tasks. These models are intuitive, visual, and perform well with structured/tabular data.

  • Decision Tree: A flowchart-like structure for decision-making
  • Random Forest: An ensemble of decision trees to improve accuracy and reduce overfitting

🎯 In this guide, you’ll learn:

  • How to build and visualize decision trees in R
  • Train a Random Forest model and evaluate its performance
  • Understand splitting rules, feature importance, and predictions

🌳 1. Decision Tree in R

✅ Load Required Packages

library(rpart)
library(rpart.plot)

✅ Example: Classify Iris Species

data(iris)
tree_model <- rpart(Species ~ ., data = iris, method = "class")
rpart.plot(tree_model, type = 3, extra = 104, fallen.leaves = TRUE)

🔍 Explanation:

  • rpart() builds a decision tree
  • method = "class" for classification
  • rpart.plot() visualizes the tree with nodes, classes, and probabilities

✅ Predict and Evaluate

pred <- predict(tree_model, iris, type = "class")
table(Predicted = pred, Actual = iris$Species)

📌 Outputs a confusion matrix comparing actual vs predicted labels.


🌲 2. Random Forest in R

✅ Load Random Forest Library

library(randomForest)

✅ Train Model on Iris Data

set.seed(123)
rf_model <- randomForest(Species ~ ., data = iris, ntree = 100)
print(rf_model)

🔍 Explanation:

  • randomForest() builds the ensemble
  • ntree = 100 uses 100 trees in the forest
  • Returns accuracy, confusion matrix, and feature importance

✅ Feature Importance Plot

importance(rf_model)
varImpPlot(rf_model)

📊 Shows which features contribute most to prediction (e.g., Petal.Width, Petal.Length)


📈 3. Predict with Random Forest

new_data <- iris[1:5, -5]  # Remove target column
predict(rf_model, new_data)

🔍 4. Compare Decision Tree vs Random Forest

FeatureDecision TreeRandom Forest
Model TypeSingle TreeEnsemble of Trees
Overfitting RiskHighLow
AccuracyModerateHigh
InterpretabilityVery High (visual)Moderate
Feature ImportanceBasicMore Reliable

📌 Summary – Recap & Next Steps

Decision Trees and Random Forests in R are easy to implement and powerful for both classification and regression problems. Trees are interpretable, and forests provide robust performance.

🔍 Key Takeaways:

  • Use rpart() and rpart.plot() for decision trees
  • Use randomForest() for ensemble learning
  • Visualize trees and variable importance
  • Evaluate model with confusion matrices and accuracy

⚙️ Real-World Relevance:
Used in credit scoring, medical diagnosis, customer segmentation, churn prediction, and fraud detection.


❓ FAQs – Decision Trees & Random Forests in R

❓ When should I use a Decision Tree over a Random Forest?
✅ Use a Decision Tree for interpretability and quick insight, and Random Forest for accuracy and robustness.

❓ How can I prevent overfitting in decision trees?
✅ Use pruning (cp in rpart.control) or switch to Random Forest which handles it automatically.

❓ What does ntree mean in Random Forest?
✅ Number of decision trees to build. More trees = better performance (to a point).

❓ How to tune Random Forest parameters?
✅ Use tuneRF() or caret::train() for automatic parameter tuning (like mtry, ntree).

❓ Can Random Forest handle missing values?
✅ Yes, the randomForest package in R can handle missing data internally.


Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

R – Decision Tree / Random Forest

Or Copy Link

CONTENTS
Scroll to Top