Data Visualization & Graphics in R
Estimated reading: 3 minutes 274 views

R Boxplots & Histograms – Visualize Distributions and Outliers in R


Introduction – Understand Data Spread with R Visualizations

When analyzing numeric data, two key visualizations stand out:

  • Boxplots: Reveal medians, quartiles, and outliers
  • Histograms: Show frequency distributions across value ranges

R makes it incredibly easy to use both, offering tools through base R and the ggplot2 library to display how your data is spread and whether it contains skewness, clusters, or anomalies.

In this guide, you’ll learn:

  • How to create and interpret boxplots and histograms
  • Use base R and ggplot2 for customized plotting
  • Detect distribution shape, spread, and outliers visually

1. Boxplots in Base R

Single Variable Boxplot

boxplot(mtcars$mpg, main = "Boxplot of MPG", ylab = "Miles Per Gallon")

Explanation:

  • Displays a summary of distribution: median (center line), IQR (box), and outliers (points beyond whiskers)
  • Helps quickly identify data spread and extreme values

Grouped Boxplot

boxplot(mpg ~ cyl, data = mtcars,
        main = "MPG by Cylinder Count", xlab = "Cylinders", ylab = "MPG", col = "lightblue")

Explanation:

  • mpg ~ cyl: Formula interface to group by cylinder
  • Shows how MPG varies across cylinder categories
  • Color added for better distinction

2. Boxplots in ggplot2

library(ggplot2)
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(fill = "tomato") +
  labs(title = "MPG by Cylinder Count", x = "Cylinders", y = "MPG")

Explanation:

  • geom_boxplot() draws the boxplot
  • factor(cyl): Ensures x-axis is categorical
  • Visual and presentation-friendly format

3. Histograms in Base R

Basic Histogram

hist(mtcars$mpg, main = "Histogram of MPG", xlab = "Miles Per Gallon", col = "lightgray", breaks = 10)

Explanation:

  • Divides data into 10 bins
  • Shows frequency of MPG values in each bin
  • Helps detect skewness, modality, and range

Customize Histogram Breaks

hist(mtcars$mpg, breaks = seq(10, 35, by = 2),
     col = "skyblue", border = "white", main = "Custom MPG Histogram")

4. Histograms in ggplot2

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
  labs(title = "Distribution of MPG", x = "Miles Per Gallon", y = "Frequency")

Explanation:

  • geom_histogram(): Histogram layer
  • binwidth = 2: Controls bar width (smaller binwidth = more bars)
  • Clearly shows distribution shape

Boxplot vs Histogram: Quick Comparison

FeatureBoxplotHistogram
Summarizes median Yes No
Shows outliers Yes No
Shows skewness Visually Clearly
Best forComparing groupsUnderstanding shape
Preferred whenLooking for outliersExploring distribution

Save Your Plot to File

png("box_hist.png", width = 800, height = 400)
par(mfrow = c(1, 2))           # Plot side by side
boxplot(mtcars$mpg)
hist(mtcars$mpg)
dev.off()

Summary – Recap & Next Steps

Boxplots and histograms are essential to understanding data distribution and variability. R provides intuitive ways to use both for single-variable summaries or grouped comparisons.

Key Takeaways:

  • Use boxplot() or geom_boxplot() to detect outliers and spread
  • Use hist() or geom_histogram() to explore frequency and skewness
  • Customize colors, bins, and labels for clarity
  • Boxplots = summary + comparison; Histograms = distribution shape

Real-World Relevance:
Used in exploratory data analysis (EDA), data quality checks, reporting, outlier detection, and distribution fitting in research, finance, healthcare, and more.


FAQs – Boxplots and Histograms in R

How do I identify outliers using a boxplot?
Outliers appear as points outside the whiskers (beyond 1.5×IQR from the box).

How can I change the number of bins in a histogram?
Use breaks in base R or binwidth in ggplot2:

geom_histogram(binwidth = 1)

Can I plot multiple boxplots side-by-side?
Yes, use formulas like y ~ group in boxplot() or map x/y in ggplot2.

Can histograms display categorical data?
No. Use bar charts for categorical data; histograms are for continuous data.

How can I overlay a normal curve on a histogram?
Use curve() and dnorm() with histogram freq = FALSE in base R.


Share Now :
Share

R – Boxplots / Histograms

Or Copy Link

CONTENTS
Scroll to Top