Biostatistics & Population Health
Cox proportional hazards regression and hazard ratios
— A hazard ratio (HR) with 95% CI
— Kaplan–Meier survival curves with a log-rank p-value
— Endpoints like "time to death," "time to MI," "time to recurrence," "time to discharge"
— Censoring (patients lost to follow-up, study ends, or competing events occur)
Board pearl: A hazard ratio is not a risk ratio. HR describes the rate of events over time among those still at risk; RR describes the cumulative probability over a fixed interval. They converge only when events are rare and follow-up is short.
— Oncology trials (overall survival, progression-free survival)
— Cardiovascular outcomes trials (MACE endpoints)
— Transplant graft survival
— Heart failure readmission studies
— ICU mortality with time-varying covariates

— "After a median follow-up of X months, the adjusted HR for [event] was 0.78 (95% CI 0.65–0.93, p=0.006)."
— Accompanying Kaplan–Meier curve showing two diverging survival curves
— A forest plot of subgroup HRs
— Event of interest: must be binary and well-defined (death, MI, stroke, relapse)
— Time origin: when the clock starts (randomization, diagnosis, surgery)
— Censoring mechanism: must be non-informative — patients censored should have the same future risk as those still followed
— Follow-up duration: median follow-up matters more than mean
Key distinction: Cox regression handles censoring (event hasn't happened yet) but does not inherently handle competing risks (a different event prevents the one of interest). Fine-Gray or cause-specific hazards models are the fix.
— Differential loss to follow-up between groups → informative censoring → biased HR
— Short follow-up with few events (rule of thumb: ≥10 events per covariate to avoid overfitting)
— Crossing Kaplan–Meier curves → proportional hazards assumption likely violated
— Failure to report median follow-up or number of events

— Y-axis: probability of being event-free (survival probability), 0–1 or 0–100%
— X-axis: time since origin (months, years)
— Stepwise drops: each step = an event
— Tick marks: censored observations
— Numbers at risk table below the curve: how many patients remain at each time point — critical for interpreting late-curve reliability
— Wider gap between curves = larger effect size
— Curves that separate early and stay parallel = constant HR, proportional hazards likely met
— Curves that cross = HR changes direction over time → PH assumption violated
— Curves that converge late = treatment effect diminishes → time-varying HR
— Plateau in survival curve = potential "cure fraction"
Board pearl: Always look at the numbers-at-risk table. If only 5 patients remain at year 5, the curve's tail is unreliable — wide confidence bands. Step 3 distractors love showing dramatic late separations driven by tiny numbers.

— HR = 1.0 → no difference in hazard between groups
— HR < 1.0 → exposure/treatment is protective (lower event rate)
— HR > 1.0 → exposure/treatment is harmful (higher event rate)
— HR = 0.70 → 30% relative reduction in the instantaneous rate of the event
— HR = 1.50 → 50% relative increase in the rate
— CI crossing 1.0 → not statistically significant
— Narrow CI → precise estimate (large sample, many events)
— Wide CI → imprecise (few events, small sample)
— A statistically significant HR of 1.05 (CI 1.01–1.10) may be clinically trivial despite p<0.05
Step 3 management: When a stem gives you an HR of 0.65 (95% CI 0.52–0.81) for a new drug vs. placebo on cardiovascular death, the answer is: the drug reduces the rate of CV death by 35%, with a statistically significant and clinically meaningful effect. Do not translate this directly into "35% of patients are saved" — that's an absolute risk reduction conflation.
— Adjusted HR controls for confounders (age, sex, comorbidities) listed in the model
— Substantial difference between crude and adjusted HR suggests confounding
— Residual confounding (unmeasured variables) is a persistent observational-study limitation

— Proportional hazards: the HR is constant over time (the hazard in one group is a constant multiple of the hazard in the other)
— Non-informative censoring: censored patients have the same future risk as uncensored
— Linearity of continuous covariates on the log-hazard scale
— Independence of observations (or appropriate clustering correction)
— Schoenfeld residuals: plotted against time; a flat line supports PH, a trend violates it
— Log-log survival plots: parallel lines support PH; converging or crossing lines violate
— Statistical test of Schoenfeld residuals (global and per-covariate)
— Stratified Cox model: stratify on the violating variable, fitting separate baseline hazards
— Time-varying coefficients: allow HR to change over time intervals
— Switch to RMST or parametric AFT models
— Report HRs over distinct time intervals (e.g., 0–6 months vs. 6+ months)
Key distinction: A time-varying covariate is a predictor whose value changes over time (e.g., on/off treatment). A time-varying coefficient means the effect of a covariate on hazard changes over time (PH violation). They are different problems with different solutions.
— Martingale residuals: check functional form of continuous covariates
— Deviance residuals: identify outliers/influential observations
— DFBETAS: assess influence of individual observations on coefficient estimates

— Univariate Cox: each predictor tested alone → crude HRs
— Multivariable Cox: predictors included simultaneously → adjusted HRs
— Stepwise or purposeful selection: refining the model
— Final model: parsimonious set of covariates with clinical justification
— A confounder is associated with both exposure and outcome but is not on the causal pathway
— Adjustment via inclusion in the Cox model
— Substantial change (>10%) in HR after adjustment suggests confounding
— When the HR for exposure differs across levels of another variable (e.g., treatment benefit varies by age)
— Tested by including an interaction term in the model
— Subgroup forest plots display this visually
— A significant interaction p-value supports true effect modification, though subgroup analyses are hypothesis-generating
Board pearl: A drug with overall HR 0.80 may show HR 0.55 in diabetics and HR 0.95 in non-diabetics — this is effect modification, not confounding. Don't "adjust away" effect modification; report stratified estimates.
— Pre-specified clinical confounders (age, sex, baseline severity)
— Variables imbalanced at baseline despite randomization
— Avoid post-randomization variables (mediators) which can introduce collider bias
— Don't include the outcome's downstream consequences

— Cox proportional hazards → default for time-to-event with censoring, PH assumption met
— Stratified Cox → PH violated for one categorical variable; want HRs for other covariates
— Time-dependent Cox → covariates that change over time (e.g., transplant status, evolving lab values)
— Fine-Gray subdistribution hazard → competing risks present (cardiac death vs. non-cardiac death)
— Cause-specific hazards model → competing risks when interest is etiologic
— Parametric AFT (accelerated failure time) → when baseline hazard shape is known/desired (Weibull, lognormal)
— Frailty models → clustered/correlated data (multicenter trials, family studies)
— Cure models → when a fraction of patients are truly cured (plateau in KM curve)
— Outcomes assessed only at fixed intervals without exact event times → use logistic regression on the binary outcome at each time
— Recurrent events → use Andersen-Gill, PWP, or frailty models
— Very small samples with few events → exact methods or Bayesian approaches
Step 3 management: When a stem describes a study of "time to first hospitalization for heart failure" in patients also at risk for non-cardiac death, recognize that Fine-Gray or cause-specific models are methodologically superior to standard Cox — Cox alone will inflate the cumulative incidence estimate.

— Absolute risk reduction (ARR) = event rate(control) − event rate(treatment)
— Number needed to treat (NNT) = 1/ARR
— Relative risk reduction (RRR) ≈ 1 − HR (approximation when events are rare)
— ARR = 3% over 3 years
— NNT = 33 over 3 years
— RRR ≈ 25%
— HR of 0.75 means 25% lower instantaneous rate
— Is the study design appropriate (RCT vs. observational)?
— Was the PH assumption tested and met?
— Are events numerous enough (≥10 per covariate)?
— Was follow-up adequate (median follow-up vs. expected event timing)?
— Was censoring non-informative?
— Are absolute risks reported alongside HR?
CCS pearl: When counseling a patient about a new therapy based on trial HR, communicate absolute benefits and harms ("3 fewer events per 100 patients over 3 years"), not just relative ("25% reduction"). Patients consistently overestimate benefit when presented with relative metrics alone — this is a documented informed-consent risk.
— Each row is a subgroup with its HR and 95% CI
— Vertical line at HR = 1 (no effect)
— Test for interaction p-value (not the within-subgroup p) indicates whether effect truly differs
— Treat subgroup findings as hypothesis-generating unless pre-specified and adequately powered

— Competing risks dominate: non-cardiac death is common, inflating Kaplan-Meier estimates of cause-specific cumulative incidence
— Time-varying covariates (functional status, renal function) often more dynamic
— Frailty as both a confounder and an effect modifier
— Drug interactions and polypharmacy complicate treatment-effect estimation
— Watch for age-by-treatment interactions — older patients may have attenuated relative benefits but larger absolute benefits (or vice versa)
— Beware survivor bias — older trial enrollees represent a "healthy survivor" cohort, limiting generalizability
— Lead-time bias in screening trials of older adults
— Often included as adjusted variables; sometimes as stratifiers when PH violated
— Time-varying eGFR can be modeled as a time-dependent covariate
— Frailty terms can capture unmeasured heterogeneity
Key distinction: A frailty model in survival analysis is a statistical term for a random-effect Cox model accounting for unmeasured heterogeneity — distinct from the clinical frailty syndrome (sarcopenia, weakness, slow gait). Boards may use the term in either context; read carefully.
— Propensity score methods help balance baseline differences
— Instrumental variable approaches can address unmeasured confounding
— Target trial emulation framework provides a transparent design template

— Therapeutic decisions in pregnancy rely on registry data and pharmacoepidemiologic cohorts
— Time-to-event analyses in pregnancy use gestational age as the time axis with left truncation to handle delayed entry (patients enroll at variable gestational ages)
— Outcomes: preterm birth, preeclampsia, stillbirth, neonatal death
— Smaller samples → fewer events → wider CIs around HRs
— Bayesian methods increasingly used to borrow information from adult data
— Age-specific hazards often non-proportional → stratified or time-varying coefficient models
— Frequently included as adjusted covariates but are social constructs, not biological mechanisms
— Race-stratified HRs may reflect structural inequities, healthcare access, or unmeasured confounding rather than biology
— Recent guidelines (NEJM, JAMA) discourage including race as a biological variable without clear justification
Board pearl: When a trial reports a subgroup HR by race that differs from the overall HR, the mechanism is more likely systemic (access, adherence, comorbidity burden) than biological. Step 3 increasingly tests this nuance under health equity questions.
— Censoring = event hasn't occurred by end of observation (right censoring most common)
— Left truncation = patient enters the risk set after the time origin (delayed entry); standard in pregnancy and registry studies
— Ignoring left truncation overestimates survival
— High early hazard (induction mortality)
— Plateau in long-term survivors (cure fraction)
— Cure models or mixture models more appropriate than standard Cox

— Treating HR as a risk ratio: HR is an instantaneous rate ratio; RR is a cumulative probability ratio. They diverge when events are common or follow-up is long.
— "50% reduction" overstatement: HR 0.5 means 50% lower hazard, not 50% of patients spared the outcome
— Ignoring absolute risk: HR 0.5 with 2% → 1% event rate is far less impactful than HR 0.5 with 40% → 20%
— Causal language from observational HRs: adjusted HRs from cohorts are associations, not causal effects, unless rigorous causal methods are applied
— Curves may cross because of true effect reversal (e.g., surgery: high early mortality, long-term benefit)
— A single overall HR averages over time and obscures this — report time-stratified HRs or RMST
— Each subgroup test inflates false-discovery risk
— Pre-specification, adjustment (Bonferroni, false discovery rate), and tests of interaction (not within-subgroup p-values) mitigate this
Key distinction: A subgroup HR with p < 0.05 is not evidence of effect modification — the interaction test p-value is. The classic ISIS-2 satire of subgroup analyses (treatment benefit varies by astrological sign) makes this point memorably.
— Occurs when a period during which the outcome cannot occur is misclassified into the treatment group
— Common in pharmacoepidemiology when treatment start is used as time origin but exposure is defined later
— Solutions: time-dependent covariate Cox, landmark analysis, target trial emulation
— Sicker patients drop out → those remaining are healthier → underestimates true hazard
— Hard to detect, harder to fix; sensitivity analyses (best/worst case imputation) help

— Single observational study with HR near 1 (e.g., 0.85, CI 0.75–0.96) — small effects in observational data may reflect residual confounding
— No PH assumption testing reported
— Crossing KM curves with a single overall HR reported
— Inadequate follow-up for the natural history of the event
— High loss to follow-up (>20%) or differential loss between groups
— Subgroup-driven conclusions without pre-specification or interaction testing
— Surrogate endpoints (e.g., LDL change) rather than hard clinical outcomes
— Large, well-powered RCT with adequate follow-up
— Pre-specified primary endpoint with HR effect size that is clinically meaningful and statistically significant
— Consistent direction across subgroups and sensitivity analyses
— Replication across multiple trials or meta-analytic confirmation
— Plausible biological mechanism
Step 3 management: Hierarchy of evidence for HR claims: meta-analysis of RCTs > single high-quality RCT > propensity-matched observational cohort > unadjusted observational study. A single observational HR rarely justifies practice change for a low-risk intervention; it almost never justifies it for a high-risk intervention.
— I² statistic quantifies between-study variability
— High I² (>50%) suggests trials are estimating different underlying effects
— Random-effects models acknowledge heterogeneity; fixed-effects assume one true HR
— Look for the forest plot with study-specific HRs and pooled estimate

— Log-rank test: nonparametric test of equality of survival curves; provides p-value but no effect size
— Wilcoxon (Breslow) test: similar to log-rank but weights early events more
— Tarone–Ware test: intermediate weighting
— Stratified log-rank: adjusts for a categorical stratifier
— IRR comes from Poisson regression assuming constant hazard; HR comes from Cox without that assumption
— Numerically similar when hazards are roughly constant over time
— OR from logistic regression on a binary outcome at a fixed time ignores time-to-event information and censoring
— Use logistic only when timing is unimportant or unavailable
Key distinction: HR ≈ RR when events are rare (<10% cumulative incidence) and follow-up is uniform. As event rates climb, HR and RR diverge, with HR typically further from the null. Don't substitute one for the other in counseling.
— Mean event-free time up to a clinically relevant horizon (e.g., 5 years)
— Robust to PH violations
— Increasingly reported in oncology trials with non-proportional hazards (e.g., immunotherapy trials with delayed separation)
— Used in composite endpoints (CV death + HF hospitalization)
— Compares hierarchical events pairwise; HR alternative in trials prioritizing fatal over non-fatal events
— EMPULSE, PARAGLIDE-HF used this approach

— Logistic regression: binary outcome, no time element, reports OR
— Linear regression: continuous outcome, reports β coefficients
— Poisson regression: count outcomes or rates with person-time denominator, reports IRR
— Negative binomial regression: overdispersed counts (recurrent HF hospitalizations)
— GEE/mixed models: correlated/longitudinal data, repeated measurements
— Target trial emulation: design observational analyses to mimic a hypothetical RCT
— Marginal structural models with inverse probability weighting: handle time-varying confounding
— G-methods: adjust for time-varying confounders affected by prior treatment
— Instrumental variables: address unmeasured confounding under strict assumptions
— Random survival forests, deep survival neural networks, gradient-boosted Cox
— Useful for prediction; interpretation of "hazard" effects more complex
— Increasingly used in risk prediction tools (e.g., MAGGIC, MELD-derivatives)
Board pearl: When a stem describes a "risk calculator" (Framingham, MELD, MAGGIC, PROMISE), recognize these are typically built on Cox regression coefficients translated into a points-based score. Understanding HR interpretation is foundational to understanding these tools.

— Number needed to treat (NNT) over a clinically relevant horizon, derived from absolute risks
— Time-to-benefit: how long until cumulative benefit accrues — critical when life expectancy is limited
— Time-to-harm: when adverse effects manifest (e.g., bleeding with anticoagulants is early; cancer signals with biologics may be late)
— Shared decision-making tools incorporating both relative and absolute estimates
— Strong recommendations typically require HR with tight CI, replicated across trials, and clinically meaningful absolute benefit
— Weak/conditional recommendations may rest on single trials or observational data
— Class of recommendation (I, IIa, IIb, III) and level of evidence (A, B, C) explicitly grade HR-supported claims
— HRs for surrogates (LDL, A1c, viral load) require validation that change in surrogate maps to change in clinical outcome
— Many drugs with favorable surrogate HRs (e.g., CETP inhibitors on LDL) failed on hard outcomes
Step 3 management: When counseling a 75-year-old with limited life expectancy about starting a statin for primary prevention, the relevant question is whether time-to-benefit (typically 2.5–5 years for cardiovascular outcomes based on HRs from primary prevention trials) is shorter than expected remaining life. If not, deprescribe or do not initiate.
— Antiplatelets, statins, beta-blockers post-MI: HRs 0.7–0.85 for recurrent MACE
— Anticoagulation in AF: HRs ~0.35 for stroke
— Apply guideline-recommended therapies adjusted for individual bleeding/adverse-event risk

— Replication trials confirm or refute initial findings
— Meta-analyses pool HRs across studies
— Real-world evidence (registries, EHR-based studies) tests whether trial HRs translate to practice
— Efficacy HR from RCTs reflects ideal conditions
— Effectiveness HR in real-world cohorts often attenuated due to adherence, comorbidity, age extremes
— Recognize when generalizing a trial HR to a different population may overstate benefit
— FDA post-marketing surveillance detects rare adverse events not captured in trial-period HRs
— Sentinel network and observational HRs complement trial data
— Adaptive trials update HR estimates as data accumulate
CCS pearl: When following a patient on a long-term therapy initiated based on trial HRs, monitor for both the intended benefit (e.g., absence of recurrent MACE) and delayed harms (bleeding, malignancy signals, drug-drug interactions). Periodically reassess whether the evidence base has shifted — new trials may show class superiority or safety signals.
— Communicate uncertainty in HR estimates (confidence intervals, generalizability)
— Frame absolute risks honestly using natural frequencies
— Avoid "the drug reduces your risk by 30%" without absolute context
— Cardiac rehab post-MI: HR ~0.74 for all-cause mortality
— Pulmonary rehab in COPD: improved survival and QoL
— Adherence to these interventions amplifies pharmacologic HR benefits

— Informed consent: patients must understand both relative and absolute benefits/harms. Presenting only HR ("25% reduction") without absolute terms is arguably misleading and can undermine valid consent.
— Equitable trial enrollment: HRs derived from underrepresented populations (women, racial/ethnic minorities, elderly, rural) may not apply to those groups, creating an equity gap
— Conflict of interest: industry-sponsored trials may emphasize favorable HRs in marketing; clinicians should access primary data and independent appraisal
— Data integrity and fraud: HRs in fabricated datasets have led to retraction of major papers; rely on guideline syntheses rather than single studies
— A patient discharged on therapies initiated based on inpatient trial evidence (e.g., post-MI quadruple therapy) is vulnerable to dropoff at the outpatient handoff
— Medication reconciliation at every transition is a Joint Commission patient safety priority
— Document indications and target durations (e.g., DAPT 12 months post-DES) so subsequent providers understand the rationale
Step 3 management: When discharging a post-MI patient, explicitly document: (1) the guideline-directed therapies started, (2) the planned duration based on trial evidence (e.g., DAPT for 12 months per HR from DAPT-duration trials), and (3) the follow-up plan to reassess. This closes the loop and prevents both premature discontinuation and indefinite continuation beyond evidence-supported durations.
— Early stopping of trials for benefit (based on interim HR analyses) can overestimate effect size
— IRB and DSMB oversight ensures trials are stopped only when ethical equipoise is broken
— Pre-registration on ClinicalTrials.gov reduces selective outcome reporting

— Cox model = semiparametric: no assumption on baseline hazard shape
— HR > 1 = harm; HR < 1 = benefit; HR = 1 = null
— CI crossing 1 = not statistically significant
— Log-rank test = nonparametric comparison of survival curves; provides p but no effect size
— Proportional hazards assumption: HR constant over time
— Schoenfeld residuals: PH diagnostic
— Kaplan–Meier curve: stepwise survival; ticks = censoring
— Numbers at risk table: judge curve reliability over time
— Competing risks: use Fine-Gray, not standard Cox
— Immortal time bias: classic pharmacoepidemiology pitfall
— Median survival "not reached": good news when curve never hits 0.5
— ≥10 events per covariate: rule of thumb against overfitting
— NNT = 1/ARR: translate HR into absolute terms
— Interaction p-value: test of effect modification (not within-subgroup p)
— C-statistic / Harrell's c: discrimination for survival models
— RMST: robust to PH violations
Board pearl: If a Step 3 stem mentions "the hazard ratio was 0.78 (95% CI 0.65–0.93)" without telling you the absolute event rates, the test is often checking whether you can correctly state "lower rate of the event" and recognize statistical significance — not whether you can compute NNT. Don't overthink.
— EMPA-REG, DAPA-HF, EMPEROR (SGLT2i, HF outcomes)
— PARADIGM-HF (sacubitril/valsartan)
— ISCHEMIA, COURAGE (PCI vs. medical therapy in stable CAD)
— JUPITER (rosuvastatin primary prevention)
— ORBITA (sham-controlled PCI)
— RE-LY, ROCKET-AF, ARISTOTLE, ENGAGE-AF (DOACs)
— FOURIER, ODYSSEY (PCSK9 inhibitors)

"A randomized trial of Drug X vs. placebo in patients with HFrEF reports an HR of 0.74 (95% CI 0.62–0.88) for the composite of CV death and HF hospitalization. Which statement best describes this finding?"
— Correct: 26% lower instantaneous rate of the event; statistically significant
— Distractors: 26% of patients spared the event (no — that conflates HR with absolute risk); not significant (no — CI excludes 1.0)
"Kaplan–Meier curves for surgical vs. medical therapy cross at 6 months, with surgery showing higher early mortality and lower late mortality. The overall HR is 0.95 (95% CI 0.80–1.12). What is the best interpretation?"
— Correct: PH assumption violated; HR averaged over time obscures true time-varying effect; report time-stratified analysis or RMST
"In a trial of an antiplatelet agent in 85-year-olds, the primary endpoint is time to MI. 20% of patients died of non-cardiac causes during follow-up. Which analysis is most appropriate?"
— Correct: Fine-Gray subdistribution hazard model
"Drug Y's HR for mortality is 0.55 in patients with diabetes and 0.95 in non-diabetics. The interaction p-value is 0.01. The best interpretation is…"
— Correct: effect modification by diabetes status; report stratified HRs
Step 3 management: Look for buzzwords that anchor the answer: "Schoenfeld residuals" → PH testing; "time-dependent covariate" → variable changes over time; "Fine-Gray" → competing risks; "log-rank" → curve comparison without effect size; "RMST" → PH-free summary.
"Trial reports HR 0.70 with control 3-year event rate of 10% and treatment 3-year event rate of 7%. What is the NNT?"
— ARR = 3%; NNT ≈ 33 over 3 years
Stem gives observational study HR of 0.92 with wide CI in a small cohort; correct answer: insufficient evidence to change practice, residual confounding likely.

Cox proportional hazards regression estimates the hazard ratio — a time-to-event effect measure interpreted as the relative rate of an outcome between groups — provided the proportional hazards assumption holds, censoring is non-informative, and absolute risks are always considered alongside the HR for clinical decision-making.
Board pearl: On Step 3, the hazard ratio is less about the math and more about the interpretation — recognize the model, interpret the HR with its CI, check the assumptions, contextualize with absolute risks, and tie it to a real management decision at the bedside or in the clinic.

