Statistical Analysis with R
Estimated reading: 3 minutes 29 views

📊 R – Statistics Intro and Working with Data Sets (with Code Explanation)


🧲 Introduction – Start Statistical Analysis with R

R is a powerful environment built specifically for statistical computing and data analysis. Whether you’re working on regression, hypothesis testing, or data visualization, it all starts with understanding your dataset and basic descriptive statistics.

🎯 In this guide, you’ll learn:

  • How to load and inspect datasets in R
  • Use basic statistical functions (mean(), median(), summary())
  • Explore built-in datasets and load external data
  • Prepare your data for further statistical modeling

🗃️ 1. Loading a Built-In Dataset

R includes many built-in datasets like mtcars, iris, airquality, and more. You can list them using:

data()           # Lists all available datasets

✅ Load and View mtcars

data(mtcars)
head(mtcars)

🔍 Explanation:

  • data(mtcars) loads the dataset into memory
  • head() shows the first six rows
  • mtcars contains information on fuel consumption and design of 32 cars

📋 2. Structure and Summary of Dataset

✅ Check Structure and Summary

str(mtcars)         # Structure (types, variables)
summary(mtcars)     # Statistical summary (min, max, mean, quartiles)

🔍 Output (partial):

 mpg             cyl             disp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8  
 Median :19.20   Median :6.000   Median :196.3  
 Mean   :20.09   Mean   :6.188   Mean   :230.7  

📈 3. Basic Descriptive Statistics

✅ Calculate Mean, Median, Mode

mean(mtcars$mpg)      # Average MPG
median(mtcars$mpg)    # Middle value

🔍 No base mode() exists in R; define it:

get_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
get_mode(mtcars$cyl)

✅ Standard Deviation & Variance

sd(mtcars$mpg)         # Standard deviation
var(mtcars$mpg)        # Variance

📊 4. Visualizing Basic Stats

hist(mtcars$mpg, col = "lightgreen", main = "Histogram of MPG")
boxplot(mtcars$mpg, main = "Boxplot of MPG", col = "lightblue")

🔍 Why use this?

  • Visuals reveal spread, outliers, and distribution skewness

📥 5. Load External Data Set

✅ Load from CSV File

data <- read.csv("your_data.csv")
head(data)

✅ Check for Missing Data

sum(is.na(data))        # Total missing values
colSums(is.na(data))    # Missing values per column

⚙️ 6. Clean and Prepare Data

✅ Rename Columns

names(data) <- c("ID", "Name", "Score")

✅ Filter or Subset

subset(data, Score > 50)    # Get records with Score > 50

🧠 Statistical Terms Explained

TermMeaning
MeanAverage of values
MedianMiddle value when sorted
ModeMost frequent value
VarianceAverage squared deviation from the mean
SDSquare root of variance
IQRInterquartile range (Q3 – Q1)

📌 Summary – Recap & Next Steps

Before jumping into advanced modeling, it’s crucial to explore and understand your data using basic statistics. R offers powerful, readable functions to describe data with both numbers and visuals.

🔍 Key Takeaways:

  • Use mean(), median(), summary(), sd() for quick insight
  • Explore data structure with str() and head()
  • Visualize spread with hist() and boxplot()
  • Clean and subset your data before analysis

⚙️ Real-World Relevance:
Used in every field—from clinical trials, market research, and finance to machine learning pre-processing.


❓ FAQs – Statistical Intro in R

❓ How do I get summary stats of all columns in R?
✅ Use summary(dataset) for min, mean, median, etc.

❓ How can I get the mode in R?
✅ Define a custom function:

get_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

❓ What is the best dataset for practicing in R?
✅ Built-in datasets like mtcars, iris, diamonds (ggplot2) are perfect for EDA.

❓ How do I visualize variable distributions?
✅ Use:

hist(), boxplot(), density()

❓ How to detect missing values in R?
✅ Use:

sum(is.na(data))   # Total
colSums(is.na(data))  # Per column

Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

R – Statistics Intro / Data Set

Or Copy Link

CONTENTS
Scroll to Top