Statistical Analysis with R
Estimated reading: 3 minutes 367 views

R Decision Tree & Random Forest – Machine Learning Models Explained


Introduction – Tree-Based Modeling in R

Decision Trees and Random Forests are two of the most widely used machine learning algorithms in R for classification and regression tasks. These models are intuitive, visual, and perform well with structured/tabular data.

  • Decision Tree: A flowchart-like structure for decision-making
  • Random Forest: An ensemble of decision trees to improve accuracy and reduce overfitting

In this guide, you’ll learn:

  • How to build and visualize decision trees in R
  • Train a Random Forest model and evaluate its performance
  • Understand splitting rules, feature importance, and predictions

1. Decision Tree in R

Load Required Packages

library(rpart)
library(rpart.plot)

Example: Classify Iris Species

data(iris)
tree_model <- rpart(Species ~ ., data = iris, method = "class")
rpart.plot(tree_model, type = 3, extra = 104, fallen.leaves = TRUE)

Explanation:

  • rpart() builds a decision tree
  • method = "class" for classification
  • rpart.plot() visualizes the tree with nodes, classes, and probabilities

Predict and Evaluate

pred <- predict(tree_model, iris, type = "class")
table(Predicted = pred, Actual = iris$Species)

Outputs a confusion matrix comparing actual vs predicted labels.


2. Random Forest in R

Load Random Forest Library

library(randomForest)

Train Model on Iris Data

set.seed(123)
rf_model <- randomForest(Species ~ ., data = iris, ntree = 100)
print(rf_model)

Explanation:

  • randomForest() builds the ensemble
  • ntree = 100 uses 100 trees in the forest
  • Returns accuracy, confusion matrix, and feature importance

Feature Importance Plot

importance(rf_model)
varImpPlot(rf_model)

Shows which features contribute most to prediction (e.g., Petal.Width, Petal.Length)


3. Predict with Random Forest

new_data <- iris[1:5, -5]  # Remove target column
predict(rf_model, new_data)

4. Compare Decision Tree vs Random Forest

FeatureDecision TreeRandom Forest
Model TypeSingle TreeEnsemble of Trees
Overfitting RiskHighLow
AccuracyModerateHigh
InterpretabilityVery High (visual)Moderate
Feature ImportanceBasicMore Reliable

Summary – Recap & Next Steps

Decision Trees and Random Forests in R are easy to implement and powerful for both classification and regression problems. Trees are interpretable, and forests provide robust performance.

Key Takeaways:

  • Use rpart() and rpart.plot() for decision trees
  • Use randomForest() for ensemble learning
  • Visualize trees and variable importance
  • Evaluate model with confusion matrices and accuracy

Real-World Relevance:
Used in credit scoring, medical diagnosis, customer segmentation, churn prediction, and fraud detection.


FAQs – Decision Trees & Random Forests in R

When should I use a Decision Tree over a Random Forest?
Use a Decision Tree for interpretability and quick insight, and Random Forest for accuracy and robustness.

How can I prevent overfitting in decision trees?
Use pruning (cp in rpart.control) or switch to Random Forest which handles it automatically.

What does ntree mean in Random Forest?
Number of decision trees to build. More trees = better performance (to a point).

How to tune Random Forest parameters?
Use tuneRF() or caret::train() for automatic parameter tuning (like mtry, ntree).

Can Random Forest handle missing values?
Yes, the randomForest package in R can handle missing data internally.


Share Now :
Share

R – Decision Tree / Random Forest

Or Copy Link

CONTENTS
Scroll to Top