R Boxplots & Histograms – Visualize Distributions and Outliers in R
Introduction – Understand Data Spread with R Visualizations
When analyzing numeric data, two key visualizations stand out:
- Boxplots: Reveal medians, quartiles, and outliers
- Histograms: Show frequency distributions across value ranges
R makes it incredibly easy to use both, offering tools through base R and the ggplot2 library to display how your data is spread and whether it contains skewness, clusters, or anomalies.
In this guide, you’ll learn:
- How to create and interpret boxplots and histograms
- Use base R and
ggplot2for customized plotting - Detect distribution shape, spread, and outliers visually
1. Boxplots in Base R
Single Variable Boxplot
boxplot(mtcars$mpg, main = "Boxplot of MPG", ylab = "Miles Per Gallon")
Explanation:
- Displays a summary of distribution: median (center line), IQR (box), and outliers (points beyond whiskers)
- Helps quickly identify data spread and extreme values
Grouped Boxplot
boxplot(mpg ~ cyl, data = mtcars,
main = "MPG by Cylinder Count", xlab = "Cylinders", ylab = "MPG", col = "lightblue")
Explanation:
mpg ~ cyl: Formula interface to group by cylinder- Shows how MPG varies across cylinder categories
- Color added for better distinction
2. Boxplots in ggplot2
library(ggplot2)
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot(fill = "tomato") +
labs(title = "MPG by Cylinder Count", x = "Cylinders", y = "MPG")
Explanation:
geom_boxplot()draws the boxplotfactor(cyl): Ensures x-axis is categorical- Visual and presentation-friendly format
3. Histograms in Base R
Basic Histogram
hist(mtcars$mpg, main = "Histogram of MPG", xlab = "Miles Per Gallon", col = "lightgray", breaks = 10)
Explanation:
- Divides data into 10 bins
- Shows frequency of MPG values in each bin
- Helps detect skewness, modality, and range
Customize Histogram Breaks
hist(mtcars$mpg, breaks = seq(10, 35, by = 2),
col = "skyblue", border = "white", main = "Custom MPG Histogram")
4. Histograms in ggplot2
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
labs(title = "Distribution of MPG", x = "Miles Per Gallon", y = "Frequency")
Explanation:
geom_histogram(): Histogram layerbinwidth = 2: Controls bar width (smaller binwidth = more bars)- Clearly shows distribution shape
Boxplot vs Histogram: Quick Comparison
| Feature | Boxplot | Histogram |
|---|---|---|
| Summarizes median | Yes | No |
| Shows outliers | Yes | No |
| Shows skewness | Visually | Clearly |
| Best for | Comparing groups | Understanding shape |
| Preferred when | Looking for outliers | Exploring distribution |
Save Your Plot to File
png("box_hist.png", width = 800, height = 400)
par(mfrow = c(1, 2)) # Plot side by side
boxplot(mtcars$mpg)
hist(mtcars$mpg)
dev.off()
Summary – Recap & Next Steps
Boxplots and histograms are essential to understanding data distribution and variability. R provides intuitive ways to use both for single-variable summaries or grouped comparisons.
Key Takeaways:
- Use
boxplot()orgeom_boxplot()to detect outliers and spread - Use
hist()orgeom_histogram()to explore frequency and skewness - Customize colors, bins, and labels for clarity
- Boxplots = summary + comparison; Histograms = distribution shape
Real-World Relevance:
Used in exploratory data analysis (EDA), data quality checks, reporting, outlier detection, and distribution fitting in research, finance, healthcare, and more.
FAQs – Boxplots and Histograms in R
How do I identify outliers using a boxplot?
Outliers appear as points outside the whiskers (beyond 1.5×IQR from the box).
How can I change the number of bins in a histogram?
Use breaks in base R or binwidth in ggplot2:
geom_histogram(binwidth = 1)
Can I plot multiple boxplots side-by-side?
Yes, use formulas like y ~ group in boxplot() or map x/y in ggplot2.
Can histograms display categorical data?
No. Use bar charts for categorical data; histograms are for continuous data.
How can I overlay a normal curve on a histogram?
Use curve() and dnorm() with histogram freq = FALSE in base R.
Share Now :
