Statistical Analysis with R
Estimated reading: 3 minutes 51 views

🧬 R Survival Analysis – Kaplan-Meier and Cox Regression with Examples


🧲 Introduction – What is Survival Analysis in R?

Survival analysis deals with time-to-event data—like how long a patient survives after treatment, or time until a machine fails. Unlike standard regression, it handles censored data, where the event hasn’t occurred by the end of observation.

R provides powerful packages such as survival and survminer to:

  • Model and analyze survival time
  • Create Kaplan-Meier survival curves
  • Compare groups using log-rank tests
  • Fit Cox Proportional Hazards Models

🎯 In this guide, you’ll learn:

  • How to create survival objects
  • Plot Kaplan-Meier curves
  • Perform log-rank tests
  • Use Cox regression for time-to-event modeling

📦 1. Load Required Packages

library(survival)
library(survminer)

📊 2. Prepare Data – Survival Object

Use the built-in lung dataset:

data(lung)
head(lung)

✅ Create a Survival Object

surv_obj <- Surv(time = lung$time, event = lung$status)

🔍 Explanation:

  • time: Survival time in days
  • status: 1 = event occurred (death), 0 = censored
  • Surv() creates a time-to-event object

📈 3. Kaplan-Meier Survival Curve

fit_km <- survfit(surv_obj ~ sex, data = lung)
ggsurvplot(fit_km, data = lung, pval = TRUE, conf.int = TRUE,
           risk.table = TRUE, legend.labs = c("Male", "Female"))

🔍 Features:

  • KM curve shows survival probability over time
  • pval = TRUE shows significance of group difference
  • risk.table adds number at risk at each time point

🔁 4. Compare Groups – Log-Rank Test

survdiff(surv_obj ~ sex, data = lung)

🔍 Output:

  • A Chi-squared test statistic
  • If p < 0.05, survival differs significantly between groups

📉 5. Cox Proportional Hazards Model

cox_model <- coxph(surv_obj ~ age + sex + ph.ecog, data = lung)
summary(cox_model)

🔍 Output Includes:

  • coef: Log hazard ratios
  • exp(coef): Hazard ratios
  • p-values for each predictor

📊 6. Visualize Cox Model

ggforest(cox_model, data = lung)

🧠 Shows hazard ratios with confidence intervals


🧪 7. Interpret the Cox Model

TermMeaning
exp(coef)Hazard Ratio (HR)
HR > 1Higher hazard (shorter survival)
HR < 1Lower hazard (longer survival)
p-valueStatistical significance

Example:
If HR = 1.5 for age, each additional year increases death risk by 50%


📌 Summary – Recap & Next Steps

Survival analysis helps understand how long until an event occurs and whether variables impact survival time. R makes it easy to analyze, plot, and model time-to-event data.

🔍 Key Takeaways:

  • Use Surv() to create survival objects
  • Plot KM curves using survfit() + ggsurvplot()
  • Use coxph() for regression modeling
  • Interpret hazard ratios and significance

⚙️ Real-World Relevance:
Widely used in clinical trials, manufacturing failure, customer churn analysis, employee retention, and time-to-purchase modeling.


❓ FAQs – Survival Analysis in R

❓ What does “censored” mean in survival analysis?
✅ A censored observation means the event didn’t occur during the study period (e.g., still alive at last follow-up).

❓ What’s the difference between Kaplan-Meier and Cox regression?
✅ Kaplan-Meier is non-parametric and estimates survival curves. Cox regression is semi-parametric and evaluates the effect of predictors.

❓ What does hazard ratio (HR) mean?
✅ HR > 1 = increased risk, HR < 1 = decreased risk.
Example: HR = 2 means twice the hazard compared to baseline.

❓ Can I include categorical predictors in Cox model?
✅ Yes, factors like sex, treatment group, etc., can be included.

❓ How to plot survival curves with confidence intervals?
✅ Use ggsurvplot() with conf.int = TRUE.


Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

R – Survival Analysis

Or Copy Link

CONTENTS
Scroll to Top