Biostatistics & Population Health

Survival analysis: Kaplan-Meier curves and hazard ratios

Clinical Overview and When to Suspect Survival Analysis Is the Right Tool

— Events need not be death: MI, stroke, recurrence, hospital readmission, graft failure, discharge, even time-to-pregnancy all qualify

— The defining feature is censoring — incomplete follow-up where the event has not yet occurred by study end, loss to follow-up, or competing event

— A Kaplan-Meier (KM) curve (stepwise survival function over time)

— A hazard ratio (HR) with 95% CI

— A log-rank p-value comparing groups

— Median survival or 5-year survival as the primary metric

— Risk ratios ignore timing — two regimens with identical 1-year mortality but very different early vs late death patterns look the same

— Censored patients cannot simply be dropped (selection bias) or counted as event-free (underestimates risk)

— Survival methods properly credit each patient for the person-time they actually contributed

— Oncology trials (overall survival, progression-free survival)

— Cardiology outcomes trials (time to MACE)

— Transplant graft survival

— ICU mortality studies

— Comparative effectiveness research and registries

Board pearl: Whenever a stem mentions "time to event," "median follow-up," or shows a curve that steps down over time, the underlying analysis is Kaplan-Meier with a Cox model generating the hazard ratio — not logistic regression, not chi-square. Recognizing the framework is half the question; the other half is interpreting the HR and the curve correctly under censoring assumptions.

Survival analysis is the family of statistical methods designed for time-to-event data, where the outcome is not just whether an event occurs but when it occurs

Suspect survival analysis is the appropriate framework when a clinical trial or cohort study reports:

Why not just use a t-test or risk ratio?

Common Step 3 contexts where survival analysis appears:

Presentation Patterns and Key History — How Survival Data Appear on Step 3

— "A randomized trial of drug X vs placebo in metastatic colon cancer reported a hazard ratio for death of 0.72 (95% CI 0.58–0.89, p=0.003). Median OS was 18 vs 14 months."

— "The Kaplan-Meier curve below shows freedom from rehospitalization at 24 months..."

— Median follow-up time: tells you how mature the data are; if median follow-up is shorter than median survival, the curve's tail is unreliable

— Number at risk table beneath the curve: as numbers drop, the curve's confidence narrows then widens; late separations with few patients at risk are fragile

— Censoring tick marks on the curve: vertical hashes indicating patients censored at that time

— Proportional hazards assumption: implicit when a single HR is reported

— Early separation, sustained: classic benefit (e.g., reperfusion in STEMI)

— Late separation ("delayed effect"): typical of immunotherapy; HR may underestimate long-term benefit

— Curves cross: violates proportional hazards — a single HR is misleading; need time-dependent analysis or restricted mean survival time (RMST)

— Plateau: suggests a cured fraction (some hematologic malignancies, definitive surgery)

— 1-year, 5-year survival percentages

— Median survival (time at which S(t) = 0.5)

— Number needed to treat derived from absolute risk difference at a landmark

Key distinction: A hazard ratio is an instantaneous relative rate, not a probability or a risk ratio. HR 0.5 does not mean "half as likely to die" — it means at any given instant, the rate of dying in the treated group is half. Over long follow-up, absolute survival differences are often smaller than the HR suggests.

Question stems typically embed survival analysis inside a clinical vignette that ends with a results paragraph or a figure

Key history elements the stem will hand you (and what they mean):

Patterns to recognize:

Look for landmark numbers the stem highlights:

Reading the Kaplan-Meier Curve — Visual and "Hemodynamic" Assessment

— Y-axis: probability of being event-free, S(t), from 1.0 down to 0

— X-axis: time since enrollment, randomization, or diagnosis (define the zero!)

— Stepwise drops: each step is an event; step height = 1/(number at risk just before)

— Tick marks or "+" symbols: censored observations (no drop, just notation)

— Shaded band or dashed lines: 95% pointwise confidence interval

— Numbers-at-risk table: critical for judging precision over time

— Steep early drop → high early hazard (surgical mortality, acute MI cohort)

— Gentle slope → low constant hazard

— Flattening plateau → declining hazard, possible cured fraction

— Sudden late drop in one arm → may reflect a small number of events in a small remaining sample; check N at risk

— Drop a horizontal line at S(t) = 0.5; where it intersects the curve, drop down to the x-axis

— If the curve never reaches 0.5, median survival is "not reached" — common in slow-progressing cancers or successful interventions; this is good news, not missing data

— Vertical distance at a landmark = absolute risk difference (useful for NNT)

— Horizontal distance at S(t) = 0.5 = difference in median survival

— Area between curves ≈ difference in restricted mean survival time

— Eyeballing late differences without checking N at risk

— Forgetting that censoring is assumed non-informative (patients lost to follow-up are not differentially sicker or healthier)

Board pearl: When two KM curves cross, do not trust a single hazard ratio. The proportional hazards assumption is violated, and the "average HR" hides that one treatment is better early and worse late (or vice versa). Look for RMST, landmark analyses, or stratified time intervals as the correct reporting.

Anatomy of a KM curve:

How to "read hemodynamics" of a curve:

Median survival from the curve:

Comparing two curves visually:

Common pitfalls when interpreting:

Diagnostic Workup — Core Metrics: Hazard, Hazard Ratio, and the Log-Rank Test

— Units: events per person-time (e.g., per person-year)

— Not a probability — can exceed 1

— Distinguish from cumulative incidence (a probability, 0–1)

— HR = 1: no difference

— HR < 1: treatment reduces the hazard (protective)

— HR > 1: treatment increases the hazard (harmful)

— 95% CI excludes 1 → statistically significant at α=0.05

— HR 0.80 = 20% relative reduction in instantaneous event rate

— HR 1.50 = 50% relative increase in rate

— HR is multiplicative, not additive; combining two interventions with HR 0.7 each does not necessarily give HR 0.49

— Nonparametric test comparing entire KM curves (not just one landmark)

— Null hypothesis: survival distributions are identical

— Most powerful when proportional hazards holds

— Reports a chi-square statistic and p-value; does not quantify effect size — pair with HR

— Wilcoxon (Breslow) test: weights early events more — useful when early differences dominate

— RMST difference: model-free, interpretable in time units

— Power in survival analysis depends on number of events, not number of patients

— A trial with 1000 patients but 30 events is underpowered; one with 200 patients and 150 events may be robust

Key distinction: Log-rank p-value answers "are the curves different?" The HR answers "by how much, on the hazard scale?" Median survival answers "what is a typical patient's time to event?" All three should be reported together; any one alone is incomplete.

Hazard function h(t): the instantaneous event rate at time t, conditional on having survived to time t

Hazard ratio (HR): ratio of hazard functions between two groups, assumed constant over time under the proportional hazards model

Interpreting HR magnitude:

Log-rank test:

Alternative tests when proportional hazards fails:

Sample size considerations:

Diagnostic Workup — The Cox Proportional Hazards Model

• Cox regression is the workhorse model that produces adjusted hazard ratios
— Semi-parametric: makes no assumption about the shape of the baseline hazard, only that covariate effects act multiplicatively on it
— Form: h(t	X) = h₀(t) · exp(β₁X₁ + β₂X₂ + ...)
— exp(β) for each covariate = adjusted HR
• What Cox lets you do:
— Adjust for confounders (age, stage, comorbidity) in observational data
— Estimate independent effects of multiple covariates simultaneously
— Include time-varying covariates (e.g., crossover, treatment switch)
— Stratify by variables that violate proportional hazards (sex, center)
• Proportional hazards assumption — how it's checked:
— Visual: log(-log(S(t))) plots should be parallel across groups
— Schoenfeld residuals: should show no time trend; a significant slope flags violation
— Time-by-covariate interaction term in the model
• When PH fails, options include:
— Stratified Cox (different baseline hazard per stratum)
— Time-varying coefficients
— Accelerated failure time (AFT) models — model time directly with parametric distributions (Weibull, log-normal)
— RMST regression — robust and interpretable
• Common Step 3 misreads:
— Confusing adjusted HR with odds ratio (it is not)
— Assuming HR applies uniformly across patient subgroups without interaction testing
— Ignoring competing risks — if a patient dies of another cause before the event of interest, standard Cox overestimates the cumulative incidence of the target event; use Fine-Gray subdistribution hazard instead
Board pearl: In an elderly cohort studying a non-fatal outcome (e.g., dementia, hip fracture), competing-risk death is common. Reporting only a cause-specific Cox HR without addressing competing mortality is a classic flaw. Step 3 may ask you to identify Fine-Gray as the appropriate method.

Risk Stratification — Choosing Endpoints and Avoiding Bias

— Overall survival (OS): time from randomization to death from any cause — the gold standard, unambiguous, captures both efficacy and toxicity

— Disease-specific survival: death from the index disease only; sensitive to cause-of-death misclassification

— Progression-free survival (PFS): time to progression or death — surrogate, faster, but assessment-dependent

— Event-free survival, recurrence-free survival, time-to-treatment-failure: composite endpoints with their own pitfalls

— Increase event count → better power

— Mask if the most severe component (death) is unaffected while a softer component drives the effect

— Always inspect components individually

— Occurs when classification of exposure requires surviving long enough to be classified (e.g., "patients who received a transplant" vs "waitlist")

— Inflates apparent benefit of the exposure

— Fix: time-dependent covariate or landmark analysis

— Earlier diagnosis lengthens apparent survival without changing time of death

— Why screening is judged on mortality reduction, not 5-year survival

— Violates KM assumptions; use sensitivity analyses, inverse probability of censoring weights

Step 3 management: When evaluating a published survival analysis for clinical application, ask: (1) Was the endpoint patient-important? (2) Were patients censored for reasons unrelated to outcome? (3) Is median follow-up adequate? (4) Does the HR's 95% CI exclude 1, and is the absolute difference clinically meaningful? An HR of 0.85 with 95% CI 0.74–0.97 is statistically significant but may translate to only 1–2% absolute benefit — discuss in shared decision-making.

Endpoint selection drives everything:

Composite endpoints (e.g., MACE = death + MI + stroke):

Immortal time bias:

Lead-time bias in screening trials:

Length-time bias: screening preferentially detects slow-growing disease, inflating survival

Informative censoring: when censored patients differ systematically (e.g., dropped out because they were getting worse)

Pharmacotherapy of Interpretation — Translating HR into Clinical Numbers

— HR alone is insufficient — pair with baseline risk in the control arm

— Approximate: if control arm has cumulative incidence p over time t, treated arm ≈ 1 − (1−p)^HR (under PH)

— Example: control 5-year mortality 40%, HR 0.75 → treated 5-year mortality ≈ 1 − (0.60)^0.75 ≈ 32%

— Absolute risk reduction (ARR) = 8%; NNT = 1/0.08 ≈ 13 over 5 years

— Always specify time horizon (NNT at 1 year ≠ NNT at 5 years)

— Use Kaplan-Meier-derived ARR at the landmark, not raw event counts (which ignore censoring)

— HR 0.5, baseline 20% → treated ~10.6%, ARR ~9.4%, NNT ~11

— HR 0.7, baseline 10% → treated ~7.1%, ARR ~2.9%, NNT ~34

— HR 1.3, baseline 5% → treated ~6.5%, ARNH (number needed to harm) ~67

— Forest plots show HR by age, sex, stage, biomarker status

— Interaction p-value tells you if effect truly differs across subgroups

— Without significant interaction, the overall HR applies — do not cherry-pick subgroups

— CONSORT extension for survival outcomes requires: HR with CI, KM curve with N at risk, median follow-up, censoring rules, PH assumption check

Board pearl: A statistically significant HR with a wide confidence interval crossing clinically meaningful thresholds (e.g., HR 0.85, 95% CI 0.72–0.99) should prompt caution — the true effect could be anywhere from a 28% reduction to a trivial 1% reduction. Always inspect the CI, not just the point estimate or p-value, before applying results to your patient.

Converting HR to absolute benefit:

Number needed to treat from survival data:

Common Step 3 conversions:

Subgroup hazard ratios:

Reporting standards:

Advanced Methods — Competing Risks, Landmark Analyses, and RMST

— A competing event precludes the event of interest (death from MI precludes future stroke)

— Standard KM overestimates cumulative incidence of the target event when competing risks are common

— Use cumulative incidence function (CIF) instead of 1 − KM

— Fine-Gray subdistribution hazard model estimates effects on CIF directly

— Cause-specific Cox is appropriate when modeling etiology; Fine-Gray when modeling prognosis/absolute risk

— Re-set the clock at a fixed timepoint (e.g., 6 months post-randomization)

— Include only patients still at risk and event-free at the landmark

— Stratify by response or status achieved by that time

— Avoids immortal time bias when comparing responders vs non-responders

— Area under the KM curve up to a clinically chosen time τ

— Difference in RMST = average gain in event-free time (in months or years) — directly interpretable

— Does not require proportional hazards

— Increasingly favored in oncology trials with crossing curves or delayed effects (immunotherapy)

— Treatment status that changes over time (transplant received, crossover)

— Properly modeled with time-varying Cox; ignoring this creates immortal time bias

Key distinction: Cause-specific hazard answers "what causes this event?" (etiologic question, useful for biology). Subdistribution hazard (Fine-Gray) answers "what is the probability a patient will experience this event accounting for competing risks?" (prognostic question, useful for counseling individual patients). Step 3 may pose a vignette of an elderly patient asking about stroke risk where cardiac death is the competing event — Fine-Gray is the right framework.

Competing risks:

Landmark analysis:

Restricted mean survival time (RMST):

Time-dependent covariates:

Frailty models: random effects for clustered survival data (multicenter trials, recurrent events in the same patient)

Joint models: link longitudinal biomarker trajectories (e.g., CD4 count) with time-to-event outcomes

Special Populations — Elderly and Patients with Competing Mortality

— High competing risk of death from causes unrelated to the index disease

— Shorter remaining life expectancy may make 10-year outcomes irrelevant

— Comorbidity burden creates effect heterogeneity

— A 78-year-old man with prostate cancer and HR 0.7 for cancer-specific mortality may derive little benefit if his competing risk of cardiovascular death is high

— Lee/Schonberg indices estimate 4-, 9-, 10-year mortality and help frame whether long-horizon interventions (screening, aggressive therapy) make sense

— Use life expectancy ≥10 years as a common threshold for offering screening colonoscopy, mammography, PSA discussion

— Subgroup HRs in frail vs robust elders often differ

— Frailty index, gait speed, grip strength refine prognosis beyond chronologic age

— Most trials exclude CKD stage 4–5, advanced cirrhosis

— Reported HRs may not apply; absolute benefit often smaller, harm larger

— Look for subgroup analyses in renal/hepatic strata before extrapolating

— Studies that report only disease-specific mortality in elders mislead — all-cause mortality is the more honest endpoint

— Composite endpoints driven by softer components (hospitalization) may not reflect what matters most (function, independence)

Step 3 management: For a 75-year-old considering an intervention with HR 0.75 for the disease outcome, ask three questions before prescribing: (1) What is the patient's 5- to 10-year all-cause mortality from competing causes? (2) Does the absolute benefit at their likely time horizon exceed the harm? (3) Does the patient prioritize length of life, function, or symptom control? Frame survival statistics in person-time the patient can grasp ("about 1 in 15 people like you avoid this event over 5 years"), not in HRs.

Older adults pose specific survival-analysis challenges:

Clinical translation:

Frailty as effect modifier:

Renal/hepatic impairment in trial generalizability:

Reporting issues specific to elderly cohorts:

Special Populations — Pediatrics, Pregnancy, and Rare-Event Settings

— Long potential follow-up (decades) introduces late effects — secondary malignancy, cardiotoxicity, infertility post-treatment

— Childhood cancer survivorship cohorts (CCSS) use survival methods to quantify these

— Age at event matters: KM time origin may be age rather than time-since-diagnosis to capture age-specific hazards

— Time-to-pregnancy, time-to-delivery, time-to-preterm-birth all use survival methods

— Gestational age is the natural time scale; censoring at delivery or study end

— Competing risks prominent: stillbirth competes with live birth, miscarriage with ongoing pregnancy

— Small number of events → wide CIs, unstable HRs

— Exact methods or Bayesian survival models may be more appropriate

— Beware over-interpretation of HR 0.3 with CI 0.05–1.8 — directionally suggestive, statistically inconclusive

— Standard KM/Cox handle only the first event

— Use Andersen-Gill, PWP, or frailty models for recurrent events

— Negative binomial regression for event-rate comparisons is an alternative

— Cluster trials need methods accounting for within-cluster correlation

— Crossover trials with carryover effects complicate time-to-event analysis

Board pearl: In a vignette describing a pediatric cancer trial reporting "10-year event-free survival of 85% vs 78%, HR 0.65," recognize that late toxicity (secondary malignancy, cardiomyopathy from anthracyclines) may not yet appear in the curves at 10 years. Long-term survivorship monitoring with periodic echocardiograms, neurocognitive testing, and second-cancer surveillance is part of the management answer, not just the survival number.

Pediatric survival analysis:

Pregnancy outcomes:

Rare-event settings:

Recurrent events (asthma exacerbations, sickle cell crises, hospitalizations):

Cluster randomization and crossover:

Complications and Adverse Outcomes — Pitfalls in Survival Analysis Interpretation

— Ignoring censoring: treating censored patients as event-free underestimates risk

— Informative censoring: dropouts correlate with prognosis; biases estimates

— Immortal time bias: misclassification of exposure during a period when the event could not occur

— Lead-time and length-time bias: in screening evaluations

— Selection bias: prevalent-cohort sampling overrepresents long survivors

— HR without CI or with CI containing 1 reported as "trending toward significance" — uninterpretable

— Median survival reported when curve never crosses 0.5 — should say "not reached"

— KM extended beyond reliable follow-up (curve continues where only 2–3 patients remain at risk)

— P-hacking with multiple time landmarks until one is significant

— Multiple subgroup analyses inflate type I error

— Pre-specification and interaction testing mitigate

— PFS benefit without OS benefit — toxicity may offset progression delay

— Examples: bevacizumab in some metastatic cancers, several oncology approvals later showing no OS gain

— When control patients cross over to active treatment on progression, OS comparisons get diluted

— Methods: rank-preserving structural failure time (RPSFT), inverse probability of censoring weighting

— Registries have confounding by indication

— Propensity scoring helps but cannot replicate randomization

Key distinction: Statistical significance (p<0.05, CI excludes 1) is not the same as clinical significance (meaningful absolute benefit). A trial of 10,000 patients can detect HR 0.96 as significant; whether that 4% relative reduction warrants exposure to drug cost, toxicity, and pill burden is a value judgment, not a statistical one.

Common analytic errors that produce misleading results:

Reporting pitfalls:

Multiplicity:

Surrogate endpoint failures:

Crossover and contamination:

Real-world data limitations:

When to Escalate — Recognizing Studies That Should Change Practice

— Pre-specified primary endpoint met with adequate power

— Clinically meaningful HR with CI excluding modest effects (e.g., HR 0.75 with upper CI <0.90)

— Consistent effect across pre-specified subgroups (no major interaction)

— Absolute benefit translatable to NNT in a relevant time horizon

— Adequate follow-up for the endpoint (median follow-up ≥ median survival when feasible)

— Reproducibility — concordant findings in at least one other trial or meta-analysis

— RCT with OS as primary endpoint > RCT with surrogate > prospective cohort with adjustment > registry with propensity matching > single-arm trial with historical control

— Subgroup-only benefit not pre-specified

— Crossing KM curves with a single reported HR

— Composite endpoint where the hard component (death) shows no benefit

— Industry-funded trial with inconsistent independent replication

— Early stopping for benefit (often overestimates effect size)

— Patient asks about a new therapy with reported survival benefit — discuss HR, absolute benefit at their horizon, toxicity, cost

— Oncology referral for trial enrollment when standard-of-care benefit is marginal

— Multidisciplinary tumor board when survival data are uncertain or conflicting

CCS pearl: In a CCS-style scenario where a patient presents asking about starting a new anticoagulant based on a recent trial reporting HR 0.80 for stroke, the appropriate orders are: (1) confirm indication and contraindications, (2) baseline labs (CBC, renal function, liver function), (3) shared decision-making note documenting discussion of absolute benefit (~1–2% ARR), bleeding risk, cost, and patient preference, (4) prescribe with appropriate follow-up interval, and (5) advance clock to follow-up to assess adherence and adverse events. Don't just prescribe because the HR is favorable — translate to the individual.

Criteria for a survival analysis to drive practice change:

Hierarchy of evidence for survival outcomes:

Red flags that should slow adoption:

When to escalate to specialist consultation in clinical practice:

Key Differentials — Other Time-to-Event Analytic Methods

— Exponential: assumes constant hazard — rarely realistic

— Weibull: monotonically increasing or decreasing hazard; closed-form survival function

— Log-normal, log-logistic: non-monotonic hazards (rises then falls) — useful for diseases with peak hazard period

— Gompertz: exponentially increasing hazard with age — natural mortality

— Output: time ratios or acceleration factors, sometimes converted to HRs

— Model the log of survival time directly as a linear function of covariates

— Coefficient = how much treatment "accelerates" or "decelerates" time to event

— More interpretable when PH fails

— Reported as time ratio (TR): TR 1.5 = treated patients take 50% longer to event

— Allow hazard to differ in pre-specified time intervals

— Handle non-PH by stratifying time

— Random survival forests, DeepSurv, gradient-boosted survival trees

— Useful for prediction with many features

— Less interpretable; ill-suited for causal inference

— Incorporates prior information; produces credible intervals

— Useful for rare events and adaptive trials

— Cox: standard for treatment effect estimation under PH

— RMST: when PH fails or for direct time-gain interpretation

— Parametric/AFT: when extrapolation beyond observed follow-up is needed (cost-effectiveness)

— ML methods: prediction tasks with high-dimensional features

Board pearl: If a question presents a curve where treatment provides clearly delayed benefit (e.g., immunotherapy with crossing then separating curves), the correct critique is that the proportional hazards assumption is violated, and the appropriate alternative report is RMST difference or milestone survival at a pre-specified landmark — not a single Cox HR.

Parametric survival models (alternatives to Cox):

Accelerated failure time (AFT) models:

Piecewise exponential models:

Machine learning approaches:

Bayesian survival analysis:

When to prefer each:

Key Differentials — Non-Survival Methods Sometimes Confused with Survival Analysis

— Models a binary outcome at a fixed time (alive vs dead at 30 days)

— Produces odds ratios, not hazard ratios

— Cannot incorporate censoring; patients lost before the fixed time are excluded or imputed

— Appropriate when timing is irrelevant and follow-up is complete

— Models event counts over person-time

— Produces incidence rate ratios

— Assumes constant hazard within strata

— Useful for recurrent events or aggregated rate data

— Inappropriate when censoring exists — treats censored as observed times, biases downward

— Compare proportions at a fixed timepoint

— Ignore time-to-event and censoring

— Valid when follow-up is complete and uniform

— Lose information when follow-up varies or events occur at different times

— Outcome is binary at a fixed time, no censoring → logistic

— Outcome is time-to-event with censoring → KM/Cox

— Outcome is count of events over time → Poisson or negative binomial

— Outcome is continuous → linear regression

— Outcome involves competing risks → Fine-Gray or cause-specific Cox

Key distinction: Odds ratio ≠ hazard ratio ≠ risk ratio. When a stem reports "OR 0.7 for 30-day mortality," that's logistic regression — different assumptions, different interpretation than HR 0.7 from a Cox model. Confusing them on a quantitative question is a high-yield trap. ORs approximate RRs only when the outcome is rare (<10%); HRs are instantaneous and not directly probabilities.

Logistic regression:

Poisson regression:

Linear regression with time as outcome:

Chi-square / Fisher exact:

Risk ratios and odds ratios in cohort studies:

Distinguishing the right method on Step 3:

Secondary Prevention / Long-Term Plan — Applying Survival Data to Clinical Decisions

— Establish baseline risk using validated risk calculators (ASCVD pooled cohort, CHA₂DS₂-VASc, Framingham, MELD)

— Apply HR from intervention trials to estimate individualized absolute benefit

— Discuss in shared decision-making with patient-specific time horizons

— Post-MI: high-intensity statin (HR for MACE ~0.78), beta-blocker, ACEi if EF reduced, dual antiplatelet for 12 months

— Atrial fibrillation: DOAC (HR for stroke ~0.80 vs warfarin, with bleeding HRs)

— HFrEF: quadruple therapy (ARNI, beta-blocker, MRA, SGLT2i) — each with all-cause mortality HRs ~0.80–0.90, compounding benefit

— Post-DVT/PE: extended anticoagulation decision balances recurrence HR vs bleeding HR

— As baseline risk falls or competing risks rise, NNT increases — at some point treatment harm exceeds benefit

— Deprescribing in elderly is informed by recalibrating absolute benefit, not just the original HR

— Post-cancer-treatment surveillance: more frequent imaging when hazard of recurrence peaks (years 1–3 for most solid tumors)

— Hazard typically declines with time-since-treatment → surveillance frequency decreases

— Note the survival data discussed (HR, NNT, time horizon)

— Patient-stated preferences

— Plan for re-evaluation

Step 3 management: For a 62-year-old post-MI on guideline-directed medical therapy, the long-term plan incorporates multiple compounding HRs — statin (~0.78), beta-blocker (~0.85), ACEi (~0.85 in reduced EF), aspirin (~0.85) — each independently lowering hazard. Counsel that the aggregate absolute benefit is substantial (NNT often <30 over 5 years for hard outcomes), and adherence is the dominant lever. Address tobacco, BP, A1c, weight, and cardiac rehab as additional hazard-reducing interventions.

Translating HR into a long-term management plan:

Examples of survival data driving secondary prevention:

Discontinuation decisions:

Surveillance intervals informed by hazard curves:

Documentation:

Follow-Up, Monitoring Parameters, and Counseling Using Survival Data

— Avoid raw HRs — patients misinterpret as probabilities

— Use absolute numbers at a relevant time horizon: "Of 100 people like you, about 15 would have a stroke in 5 years without treatment, vs 10 with treatment"

— Use visual aids: pictographs (100-person diagrams), bar charts of 5-year risk with/without intervention

— Frame both gain and risk: NNT and NNH side by side

— Acknowledge uncertainty: "These numbers are averages; individual outcomes vary"

— For survival-driven therapy (statin, anticoagulant), measure surrogate (LDL, INR/anti-Xa) and adherence — not "did the event happen" (most won't)

— Reassess risk periodically as age, comorbidities accumulate

— Re-engage shared decision-making when context changes (new diagnosis, frailty, patient preference shift)

— Distinguish median survival (half live longer, half shorter) from "expected survival"

— Show curves when patients want detail; avoid when patients prefer narrative

— Address uncertainty intervals honestly — don't quote a single number as fact

— Exercise post-MI: HR for mortality ~0.75 from meta-analyses

— Smoking cessation post-cancer: HR for second cancer and death substantially reduced

— Weight loss in HFpEF, T2DM: variable HRs by population

— Annual reassessment of risk and ongoing benefit

— Document shared decision-making

Board pearl: When a stem asks how to communicate a trial result showing HR 0.7 for stroke with an anticoagulant, the highest-yield answer is to translate to natural frequencies at a fixed time (e.g., "3 fewer strokes per 100 patients over 2 years") rather than report the HR. Natural frequencies improve patient comprehension and informed consent quality — a tested concept on Step 3 communication and biostatistics items.

Communicating survival data to patients:

Monitoring efficacy of an intervention:

Counseling about prognosis:

Cardiac/oncology rehab and lifestyle:

Documentation cadence:

Ethical, Legal, and Patient Safety Considerations

— Patients enrolling in trials must understand probabilistic outcomes, not just hoped-for benefit

— Disclose HR, absolute benefit, toxicity, and uncertainty

— Avoid therapeutic misconception (patients believing trial participation guarantees benefit)

— Explicitly state if median survival is "not reached" what that does and does not mean

— DSMB may halt trials when interim survival analysis crosses pre-specified boundary

— Ethically required when one arm is clearly superior

— But: early stopping inflates effect estimates and shortens follow-up — limits long-term safety data

— Patients on losing arm must be informed and offered crossover when feasible

— Placebo control is ethical only when no proven effective therapy exists

— Active-comparator trials are required when standard of care has known mortality benefit

— Patients discharged on a new therapy based on a survival trial need clear follow-up

— Communicate the rationale (HR, NNT) to receiving physicians

— Medication reconciliation prevents inadvertent discontinuation of survival-prolonging therapy (e.g., beta-blocker post-MI omitted at hospital discharge — measurable mortality impact)

— ClinicalTrials.gov registration prior to enrollment

— Results reporting mandated by FDAAA — survival outcomes must be published or posted within 12 months of study completion

— Selective outcome reporting is research misconduct

— Trial populations often underrepresent minorities, women, elderly — limits generalizability of HRs

— Real-world subgroup data and post-marketing surveillance address gaps

— Adverse events in trials reported to IRB and FDA

— Death classification (cause-specific) requires careful, blinded adjudication

Step 3 management: A patient discharged post-MI on five new medications based on trial-proven survival benefits is at high risk of medication non-adherence and transition errors. Prevent harm with: explicit discharge summary listing each medication's purpose ("aspirin for clot prevention"), 1-week post-discharge phone call, 1- to 2-week clinic follow-up, medication reconciliation at every encounter, and engagement of pharmacist/cardiac rehab. Each missed medication erodes the hazard reduction the trials promised.

Informed consent and survival data:

Early stopping for benefit:

Equipoise and placebo arms in survival trials:

Transition-of-care risk:

Reporting requirements:

Health equity:

Mandatory reporting:

High-Yield Associations and Rapid-Fire Clinical Facts

Board pearl: When a vignette gives you HR and baseline risk and asks for absolute risk reduction, default approximation: ARR ≈ baseline risk × (1 − HR) when baseline risk is modest. For baseline 20%, HR 0.7 → ARR ≈ 6%, NNT ≈ 17. This shortcut is sufficient for most Step 3 calculations; the rigorous formula (1 − (1−p)^HR) gives nearly identical numbers when p < 0.3.

Kaplan-Meier curve = stepwise survival probability over time; drops at events, tick marks at censoring

Hazard ratio = instantaneous relative event rate; not a probability, not an odds ratio

HR 0.5 ≠ "half the risk" — it's half the instantaneous hazard; absolute benefit depends on baseline

Log-rank test = nonparametric comparison of whole curves; most powerful under proportional hazards

Cox proportional hazards = semi-parametric model producing adjusted HRs

Proportional hazards assumption checked by Schoenfeld residuals or log-log plots

Crossing KM curves → PH violated → don't trust a single HR; use RMST or landmark analysis

Median survival = time at which S(t) = 0.5; "not reached" means more than half are event-free at last follow-up

Number at risk table is essential — late curve differences with few patients are unreliable

Censoring assumed non-informative; informative censoring biases results

Competing risks → use cumulative incidence function, Fine-Gray model

Immortal time bias → use time-dependent covariates or landmark analysis

Lead-time bias inflates survival in screening evaluations; use mortality, not 5-year survival

Power in survival analysis = function of number of events, not number of patients

PFS is a surrogate for OS; benefit on PFS without OS may not translate to clinical benefit

Composite endpoints can mask if soft components drive the effect

NNT from survival = 1/ARR at a specified landmark time

Subgroup HRs require interaction p-value to claim differential effect

OR (logistic), HR (Cox), RR (cohort), IRR (Poisson) — different models, different scales

Communicate to patients in natural frequencies, not HRs

Board Question Stem Patterns

— Stem: "A trial reports HR 0.65 (95% CI 0.50–0.84) for the primary endpoint."

— Asked: meaning of HR; significance based on CI; whether to apply to a specific patient

— Trap: confusing HR with RR or probability

— Figure shows two curves separating early or late

— Asked: median survival, ARR at a landmark, whether PH is violated

— Trap: trusting late differences with few patients at risk

— Stem describes a screening study reporting improved 5-year survival

— Asked: most likely bias (lead-time, length-time, selection)

— Trap: assuming screening works because survival rose

— Stem describes time-to-event data with censoring

— Options: KM/Cox, logistic, linear regression, chi-square

— Answer: KM with Cox regression

— Elderly cohort, non-fatal outcome, high competing mortality

— Asked: appropriate method

— Answer: Fine-Gray subdistribution hazard or cumulative incidence function

— Stem compares "transplanted vs waitlisted" or "responders vs non-responders" without landmark

— Asked: source of bias

— Answer: immortal time bias; fix with landmark or time-dependent covariate

— Stem gives HR and asks for NNT or absolute benefit

— Apply ARR ≈ baseline × (1 − HR)

— Patient asks "What does HR 0.7 mean for me?"

— Best answer: translate to natural frequencies at relevant time horizon

— Forest plot shows benefit in one subgroup only

— Asked: validity

— Answer: requires pre-specification and interaction p-value; otherwise hypothesis-generating

— KM curves cross at ~6 months

— Asked: best analytic approach

— Answer: RMST or stratified time-period analysis; not a single Cox HR

Step 3 management: When a question asks "what should you tell the patient about this trial's HR of 0.75 for death," default to a translated, time-bounded, patient-centered statement with absolute numbers and acknowledgment of uncertainty — not a recitation of the HR. Communication-style answers consistently outscore technical recitations on Step 3 biostatistics items.

Pattern 1 — HR interpretation:

Pattern 2 — KM curve reading:

Pattern 3 — Bias identification:

Pattern 4 — Method selection:

Pattern 5 — Competing risks:

Pattern 6 — Immortal time bias:

Pattern 7 — Clinical translation:

Pattern 8 — Patient communication:

Pattern 9 — Subgroup analysis:

Pattern 10 — Crossing curves:

One-Line Recap

Survival analysis uses Kaplan-Meier curves to display time-to-event data with proper handling of censoring, and Cox proportional hazards regression to produce hazard ratios that quantify the relative instantaneous event rate between groups — but the hazard ratio must always be interpreted alongside baseline risk, confidence interval, proportional hazards validity, competing risks, and clinically meaningful absolute benefit before driving any patient decision.

Board pearl: If you remember only one thing about survival analysis for Step 3, remember that the hazard ratio answers "how much faster does this happen?" while patients want to know "what's my chance, and how much does treatment help?" — your job is to bridge the two with absolute numbers at a clinically relevant time point.

Core methods: KM curves visualize S(t); log-rank tests compare curves; Cox regression adjusts for covariates and produces HRs; Fine-Gray handles competing risks; RMST is the go-to when proportional hazards fails

Critical interpretation rules: HR is not a probability, not an OR; absolute benefit depends on baseline risk; crossing curves invalidate a single HR; censoring must be non-informative; power depends on event count, not sample size

High-yield biases: lead-time and length-time in screening, immortal time in cohort exposure classification, informative censoring in dropouts, selection in prevalent cohorts — each has a specific analytic fix (mortality endpoints, landmark analysis, time-dependent covariates, inception cohorts)

Clinical application: translate HR + baseline risk → ARR → NNT at a meaningful time horizon, communicate in natural frequencies, document shared decision-making, and reassess as patient context (age, comorbidities, competing risks) evolves over time