🧬 R Survival Analysis – Kaplan-Meier and Cox Regression with Examples
🧲 Introduction – What is Survival Analysis in R?
Survival analysis deals with time-to-event data—like how long a patient survives after treatment, or time until a machine fails. Unlike standard regression, it handles censored data, where the event hasn’t occurred by the end of observation.
R provides powerful packages such as survival and survminer to:
- Model and analyze survival time
- Create Kaplan-Meier survival curves
- Compare groups using log-rank tests
- Fit Cox Proportional Hazards Models
🎯 In this guide, you’ll learn:
- How to create survival objects
- Plot Kaplan-Meier curves
- Perform log-rank tests
- Use Cox regression for time-to-event modeling
📦 1. Load Required Packages
library(survival)
library(survminer)
📊 2. Prepare Data – Survival Object
Use the built-in lung dataset:
data(lung)
head(lung)
✅ Create a Survival Object
surv_obj <- Surv(time = lung$time, event = lung$status)
🔍 Explanation:
- time: Survival time in days
- status: 1 = event occurred (death), 0 = censored
- Surv()creates a time-to-event object
📈 3. Kaplan-Meier Survival Curve
fit_km <- survfit(surv_obj ~ sex, data = lung)
ggsurvplot(fit_km, data = lung, pval = TRUE, conf.int = TRUE,
           risk.table = TRUE, legend.labs = c("Male", "Female"))
🔍 Features:
- KM curve shows survival probability over time
- pval = TRUEshows significance of group difference
- risk.tableadds number at risk at each time point
🔁 4. Compare Groups – Log-Rank Test
survdiff(surv_obj ~ sex, data = lung)
🔍 Output:
- A Chi-squared test statistic
- If p < 0.05, survival differs significantly between groups
📉 5. Cox Proportional Hazards Model
cox_model <- coxph(surv_obj ~ age + sex + ph.ecog, data = lung)
summary(cox_model)
🔍 Output Includes:
- coef: Log hazard ratios
- exp(coef): Hazard ratios
- p-values for each predictor
📊 6. Visualize Cox Model
ggforest(cox_model, data = lung)
🧠 Shows hazard ratios with confidence intervals
🧪 7. Interpret the Cox Model
| Term | Meaning | 
|---|---|
| exp(coef) | Hazard Ratio (HR) | 
| HR > 1 | Higher hazard (shorter survival) | 
| HR < 1 | Lower hazard (longer survival) | 
| p-value | Statistical significance | 
Example:
If HR = 1.5 for age, each additional year increases death risk by 50%
📌 Summary – Recap & Next Steps
Survival analysis helps understand how long until an event occurs and whether variables impact survival time. R makes it easy to analyze, plot, and model time-to-event data.
🔍 Key Takeaways:
- Use Surv()to create survival objects
- Plot KM curves using survfit()+ggsurvplot()
- Use coxph()for regression modeling
- Interpret hazard ratios and significance
⚙️ Real-World Relevance:
Widely used in clinical trials, manufacturing failure, customer churn analysis, employee retention, and time-to-purchase modeling.
❓ FAQs – Survival Analysis in R
❓ What does “censored” mean in survival analysis?
✅ A censored observation means the event didn’t occur during the study period (e.g., still alive at last follow-up).
❓ What’s the difference between Kaplan-Meier and Cox regression?
✅ Kaplan-Meier is non-parametric and estimates survival curves. Cox regression is semi-parametric and evaluates the effect of predictors.
❓ What does hazard ratio (HR) mean?
✅ HR > 1 = increased risk, HR < 1 = decreased risk.
Example: HR = 2 means twice the hazard compared to baseline.
❓ Can I include categorical predictors in Cox model?
✅ Yes, factors like sex, treatment group, etc., can be included.
❓ How to plot survival curves with confidence intervals?
✅ Use ggsurvplot() with conf.int = TRUE.
Share Now :
