R Survival Analysis – Kaplan-Meier and Cox Regression with Examples
Introduction – What is Survival Analysis in R?
Survival analysis deals with time-to-event data—like how long a patient survives after treatment, or time until a machine fails. Unlike standard regression, it handles censored data, where the event hasn’t occurred by the end of observation.
R provides powerful packages such as survival and survminer to:
- Model and analyze survival time
- Create Kaplan-Meier survival curves
- Compare groups using log-rank tests
- Fit Cox Proportional Hazards Models
In this guide, you’ll learn:
- How to create survival objects
- Plot Kaplan-Meier curves
- Perform log-rank tests
- Use Cox regression for time-to-event modeling
1. Load Required Packages
library(survival)
library(survminer)
2. Prepare Data – Survival Object
Use the built-in lung dataset:
data(lung)
head(lung)
Create a Survival Object
surv_obj <- Surv(time = lung$time, event = lung$status)
Explanation:
time: Survival time in daysstatus: 1 = event occurred (death), 0 = censoredSurv()creates a time-to-event object
3. Kaplan-Meier Survival Curve
fit_km <- survfit(surv_obj ~ sex, data = lung)
ggsurvplot(fit_km, data = lung, pval = TRUE, conf.int = TRUE,
risk.table = TRUE, legend.labs = c("Male", "Female"))
Features:
- KM curve shows survival probability over time
pval = TRUEshows significance of group differencerisk.tableadds number at risk at each time point
4. Compare Groups – Log-Rank Test
survdiff(surv_obj ~ sex, data = lung)
Output:
- A Chi-squared test statistic
- If p < 0.05, survival differs significantly between groups
5. Cox Proportional Hazards Model
cox_model <- coxph(surv_obj ~ age + sex + ph.ecog, data = lung)
summary(cox_model)
Output Includes:
- coef: Log hazard ratios
- exp(coef): Hazard ratios
- p-values for each predictor
6. Visualize Cox Model
ggforest(cox_model, data = lung)
Shows hazard ratios with confidence intervals
7. Interpret the Cox Model
| Term | Meaning |
|---|---|
exp(coef) | Hazard Ratio (HR) |
| HR > 1 | Higher hazard (shorter survival) |
| HR < 1 | Lower hazard (longer survival) |
| p-value | Statistical significance |
Example:
If HR = 1.5 for age, each additional year increases death risk by 50%
Summary – Recap & Next Steps
Survival analysis helps understand how long until an event occurs and whether variables impact survival time. R makes it easy to analyze, plot, and model time-to-event data.
Key Takeaways:
- Use
Surv()to create survival objects - Plot KM curves using
survfit()+ggsurvplot() - Use
coxph()for regression modeling - Interpret hazard ratios and significance
Real-World Relevance:
Widely used in clinical trials, manufacturing failure, customer churn analysis, employee retention, and time-to-purchase modeling.
FAQs – Survival Analysis in R
What does “censored” mean in survival analysis?
A censored observation means the event didn’t occur during the study period (e.g., still alive at last follow-up).
What’s the difference between Kaplan-Meier and Cox regression?
Kaplan-Meier is non-parametric and estimates survival curves. Cox regression is semi-parametric and evaluates the effect of predictors.
What does hazard ratio (HR) mean?
HR > 1 = increased risk, HR < 1 = decreased risk.
Example: HR = 2 means twice the hazard compared to baseline.
Can I include categorical predictors in Cox model?
Yes, factors like sex, treatment group, etc., can be included.
How to plot survival curves with confidence intervals?
Use ggsurvplot() with conf.int = TRUE.
Share Now :
