Biostatistics & Population Health

Cox proportional hazards regression and hazard ratios

Clinical Overview and When to Suspect Survival Analysis

— A hazard ratio (HR) with 95% CI

— Kaplan–Meier survival curves with a log-rank p-value

— Endpoints like "time to death," "time to MI," "time to recurrence," "time to discharge"

— Censoring (patients lost to follow-up, study ends, or competing events occur)

Board pearl: A hazard ratio is not a risk ratio. HR describes the rate of events over time among those still at risk; RR describes the cumulative probability over a fixed interval. They converge only when events are rare and follow-up is short.

— Oncology trials (overall survival, progression-free survival)

— Cardiovascular outcomes trials (MACE endpoints)

— Transplant graft survival

— Heart failure readmission studies

— ICU mortality with time-varying covariates

Cox proportional hazards regression is the workhorse model for time-to-event outcomes in clinical research — used when the outcome is not just whether an event occurs, but when it occurs.

Suspect a Cox model is appropriate when a study reports:

Cox is semiparametric: it models the effect of covariates on the hazard without assuming a specific baseline hazard distribution — this is why it is so widely used over parametric alternatives (Weibull, exponential).

Core output: hazard ratio, interpreted as the instantaneous rate of the event in the exposed group divided by the unexposed group, at any given time.

Step 3 will test you on interpreting an HR from a trial abstract, distinguishing it from relative risk and odds ratio, and recognizing when proportional hazards assumptions may be violated.

Common Step 3 scenarios deploying Cox regression:

Recognize the trial vocabulary: "adjusted HR," "stratified Cox model," "time-dependent covariate," and "proportional hazards assumption" — these tell you a Cox framework is in play and signal the level of methodologic rigor.

Presentation Patterns and Key History (How the Data "Presents")

— "After a median follow-up of X months, the adjusted HR for [event] was 0.78 (95% CI 0.65–0.93, p=0.006)."

— Accompanying Kaplan–Meier curve showing two diverging survival curves

— A forest plot of subgroup HRs

— Event of interest: must be binary and well-defined (death, MI, stroke, relapse)

— Time origin: when the clock starts (randomization, diagnosis, surgery)

— Censoring mechanism: must be non-informative — patients censored should have the same future risk as those still followed

— Follow-up duration: median follow-up matters more than mean

Key distinction: Cox regression handles censoring (event hasn't happened yet) but does not inherently handle competing risks (a different event prevents the one of interest). Fine-Gray or cause-specific hazards models are the fix.

— Differential loss to follow-up between groups → informative censoring → biased HR

— Short follow-up with few events (rule of thumb: ≥10 events per covariate to avoid overfitting)

— Crossing Kaplan–Meier curves → proportional hazards assumption likely violated

— Failure to report median follow-up or number of events

In a Step 3 stem, Cox regression presents as a results paragraph or table from a clinical trial or cohort study. Recognize the pattern:

Key "history" elements of the dataset to identify:

Watch for competing risks: in elderly cardiovascular trials, non-cardiac death "competes" with the cardiac event of interest. Standard Cox overestimates the cumulative incidence when competing risks are present — a Fine-Gray subdistribution hazard model is more appropriate.

Red flags suggesting the analysis may be flawed:

Always check whether the HR is crude or adjusted — adjusted HRs control for confounders; crude HRs reflect raw associations and may be misleading in observational data.

Physical Exam Findings — Reading the Kaplan–Meier Curve

— Y-axis: probability of being event-free (survival probability), 0–1 or 0–100%

— X-axis: time since origin (months, years)

— Stepwise drops: each step = an event

— Tick marks: censored observations

— Numbers at risk table below the curve: how many patients remain at each time point — critical for interpreting late-curve reliability

— Wider gap between curves = larger effect size

— Curves that separate early and stay parallel = constant HR, proportional hazards likely met

— Curves that cross = HR changes direction over time → PH assumption violated

— Curves that converge late = treatment effect diminishes → time-varying HR

— Plateau in survival curve = potential "cure fraction"

Board pearl: Always look at the numbers-at-risk table. If only 5 patients remain at year 5, the curve's tail is unreliable — wide confidence bands. Step 3 distractors love showing dramatic late separations driven by tiny numbers.

The Kaplan–Meier (KM) curve is the visual "physical exam" of survival data. Master its anatomy:

Visual cues and their meaning:

The log-rank test is the nonparametric test comparing entire KM curves; it tests the null that survival is identical. A significant log-rank p-value supports a true difference but gives no effect size — that's what the HR provides.

Median survival is read off the curve where survival probability = 0.5. If the curve never crosses 0.5, median survival is "not reached" — a good sign in a treatment arm.

Restricted mean survival time (RMST) is an alternative summary measure that doesn't require the PH assumption and is increasingly reported in oncology trials when curves cross or have non-proportional patterns.

Diagnostic Workup — Interpreting the Hazard Ratio

— HR = 1.0 → no difference in hazard between groups

— HR < 1.0 → exposure/treatment is protective (lower event rate)

— HR > 1.0 → exposure/treatment is harmful (higher event rate)

— HR = 0.70 → 30% relative reduction in the instantaneous rate of the event

— HR = 1.50 → 50% relative increase in the rate

— CI crossing 1.0 → not statistically significant

— Narrow CI → precise estimate (large sample, many events)

— Wide CI → imprecise (few events, small sample)

— A statistically significant HR of 1.05 (CI 1.01–1.10) may be clinically trivial despite p<0.05

Step 3 management: When a stem gives you an HR of 0.65 (95% CI 0.52–0.81) for a new drug vs. placebo on cardiovascular death, the answer is: the drug reduces the rate of CV death by 35%, with a statistically significant and clinically meaningful effect. Do not translate this directly into "35% of patients are saved" — that's an absolute risk reduction conflation.

— Adjusted HR controls for confounders (age, sex, comorbidities) listed in the model

— Substantial difference between crude and adjusted HR suggests confounding

— Residual confounding (unmeasured variables) is a persistent observational-study limitation

The hazard ratio is the central output of Cox regression. Interpretation framework:

The 95% confidence interval matters more than the point estimate:

For continuous covariates, HR is interpreted per unit increase (e.g., HR 1.03 per mmHg of systolic BP) or per standard deviation — always check the units.

For categorical covariates, HR compares each category to a reference group — identify the reference!

Adjusted vs. unadjusted HR:

Always pair HR with absolute event rates — a "50% reduction" sounds dramatic but means little if the absolute event rate drops from 0.2% to 0.1%.

Diagnostic Workup — Assumptions and Model Diagnostics

— Proportional hazards: the HR is constant over time (the hazard in one group is a constant multiple of the hazard in the other)

— Non-informative censoring: censored patients have the same future risk as uncensored

— Linearity of continuous covariates on the log-hazard scale

— Independence of observations (or appropriate clustering correction)

— Schoenfeld residuals: plotted against time; a flat line supports PH, a trend violates it

— Log-log survival plots: parallel lines support PH; converging or crossing lines violate

— Statistical test of Schoenfeld residuals (global and per-covariate)

— Stratified Cox model: stratify on the violating variable, fitting separate baseline hazards

— Time-varying coefficients: allow HR to change over time intervals

— Switch to RMST or parametric AFT models

— Report HRs over distinct time intervals (e.g., 0–6 months vs. 6+ months)

Key distinction: A time-varying covariate is a predictor whose value changes over time (e.g., on/off treatment). A time-varying coefficient means the effect of a covariate on hazard changes over time (PH violation). They are different problems with different solutions.

— Martingale residuals: check functional form of continuous covariates

— Deviance residuals: identify outliers/influential observations

— DFBETAS: assess influence of individual observations on coefficient estimates

Cox regression rests on several assumptions that boards love to test:

How proportional hazards is checked:

When PH is violated, options include:

Other diagnostics:

Overfitting is a real concern: rule of thumb is ≥10 events per covariate included in the model. A study modeling 15 covariates on 80 events is overfitted and HR estimates are unstable.

Risk Stratification — Crude vs. Adjusted HR and Confounding

— Univariate Cox: each predictor tested alone → crude HRs

— Multivariable Cox: predictors included simultaneously → adjusted HRs

— Stepwise or purposeful selection: refining the model

— Final model: parsimonious set of covariates with clinical justification

— A confounder is associated with both exposure and outcome but is not on the causal pathway

— Adjustment via inclusion in the Cox model

— Substantial change (>10%) in HR after adjustment suggests confounding

— When the HR for exposure differs across levels of another variable (e.g., treatment benefit varies by age)

— Tested by including an interaction term in the model

— Subgroup forest plots display this visually

— A significant interaction p-value supports true effect modification, though subgroup analyses are hypothesis-generating

Board pearl: A drug with overall HR 0.80 may show HR 0.55 in diabetics and HR 0.95 in non-diabetics — this is effect modification, not confounding. Don't "adjust away" effect modification; report stratified estimates.

— Pre-specified clinical confounders (age, sex, baseline severity)

— Variables imbalanced at baseline despite randomization

— Avoid post-randomization variables (mediators) which can introduce collider bias

— Don't include the outcome's downstream consequences

The progression of analysis in a well-conducted study typically follows:

Confounding in time-to-event analysis:

Effect modification (interaction):

Selection of covariates:

Propensity score methods (matching, weighting, stratification) are alternatives to multivariable Cox for observational data, balancing many confounders simultaneously and often used in cardiology and oncology comparative effectiveness research.

Pharmacotherapy Analog — Choosing the Right Survival Model

— Cox proportional hazards → default for time-to-event with censoring, PH assumption met

— Stratified Cox → PH violated for one categorical variable; want HRs for other covariates

— Time-dependent Cox → covariates that change over time (e.g., transplant status, evolving lab values)

— Fine-Gray subdistribution hazard → competing risks present (cardiac death vs. non-cardiac death)

— Cause-specific hazards model → competing risks when interest is etiologic

— Parametric AFT (accelerated failure time) → when baseline hazard shape is known/desired (Weibull, lognormal)

— Frailty models → clustered/correlated data (multicenter trials, family studies)

— Cure models → when a fraction of patients are truly cured (plateau in KM curve)

— Outcomes assessed only at fixed intervals without exact event times → use logistic regression on the binary outcome at each time

— Recurrent events → use Andersen-Gill, PWP, or frailty models

— Very small samples with few events → exact methods or Bayesian approaches

Step 3 management: When a stem describes a study of "time to first hospitalization for heart failure" in patients also at risk for non-cardiac death, recognize that Fine-Gray or cause-specific models are methodologically superior to standard Cox — Cox alone will inflate the cumulative incidence estimate.

Just as drugs have indications, survival models have appropriate use cases:

When Cox is inappropriate:

Report standards: trials should follow CONSORT (RCTs) or STROBE (observational); survival analyses are often reported per STROBE-extension guidance, including median follow-up, number of events, PH assumption testing, and handling of missing data and censoring.

Missing data handling: complete-case analysis biases results; multiple imputation is preferred when missingness is at random.

Procedures — Calculating and Translating HR into Clinical Action

— Absolute risk reduction (ARR) = event rate(control) − event rate(treatment)

— Number needed to treat (NNT) = 1/ARR

— Relative risk reduction (RRR) ≈ 1 − HR (approximation when events are rare)

— ARR = 3% over 3 years

— NNT = 33 over 3 years

— RRR ≈ 25%

— HR of 0.75 means 25% lower instantaneous rate

— Is the study design appropriate (RCT vs. observational)?

— Was the PH assumption tested and met?

— Are events numerous enough (≥10 per covariate)?

— Was follow-up adequate (median follow-up vs. expected event timing)?

— Was censoring non-informative?

— Are absolute risks reported alongside HR?

CCS pearl: When counseling a patient about a new therapy based on trial HR, communicate absolute benefits and harms ("3 fewer events per 100 patients over 3 years"), not just relative ("25% reduction"). Patients consistently overestimate benefit when presented with relative metrics alone — this is a documented informed-consent risk.

— Each row is a subgroup with its HR and 95% CI

— Vertical line at HR = 1 (no effect)

— Test for interaction p-value (not the within-subgroup p) indicates whether effect truly differs

— Treat subgroup findings as hypothesis-generating unless pre-specified and adequately powered

Translating an HR into clinical decisions requires more than the HR alone:

Example: a trial reports HR 0.75 (95% CI 0.65–0.87) for MACE with a new drug. Control event rate over 3 years = 12%; treatment = 9%.

Pre-test framework for HR consumption:

Subgroup analyses displayed as forest plots:

Special Populations — Elderly and High-Comorbidity Cohorts in Survival Analysis

— Competing risks dominate: non-cardiac death is common, inflating Kaplan-Meier estimates of cause-specific cumulative incidence

— Time-varying covariates (functional status, renal function) often more dynamic

— Frailty as both a confounder and an effect modifier

— Drug interactions and polypharmacy complicate treatment-effect estimation

— Watch for age-by-treatment interactions — older patients may have attenuated relative benefits but larger absolute benefits (or vice versa)

— Beware survivor bias — older trial enrollees represent a "healthy survivor" cohort, limiting generalizability

— Lead-time bias in screening trials of older adults

— Often included as adjusted variables; sometimes as stratifiers when PH violated

— Time-varying eGFR can be modeled as a time-dependent covariate

— Frailty terms can capture unmeasured heterogeneity

Key distinction: A frailty model in survival analysis is a statistical term for a random-effect Cox model accounting for unmeasured heterogeneity — distinct from the clinical frailty syndrome (sarcopenia, weakness, slow gait). Boards may use the term in either context; read carefully.

— Propensity score methods help balance baseline differences

— Instrumental variable approaches can address unmeasured confounding

— Target trial emulation framework provides a transparent design template

Elderly populations introduce specific Cox regression challenges:

In Step 3 stems involving geriatric cardiovascular or oncology trials:

Renal/hepatic impairment as covariates:

Generalizability concern: many landmark trials underrepresent patients >75, with significant comorbidities, or with CKD. Adjusted HRs from these trials may not apply to your 82-year-old with CKD-4 and atrial fibrillation.

In observational comparative effectiveness studies of elderly patients:

Always pair HR-based evidence with shared decision-making, especially when life expectancy is short and time-to-benefit may exceed remaining life expectancy.

Special Populations — Pregnancy, Pediatrics, and Underrepresented Groups

— Therapeutic decisions in pregnancy rely on registry data and pharmacoepidemiologic cohorts

— Time-to-event analyses in pregnancy use gestational age as the time axis with left truncation to handle delayed entry (patients enroll at variable gestational ages)

— Outcomes: preterm birth, preeclampsia, stillbirth, neonatal death

— Smaller samples → fewer events → wider CIs around HRs

— Bayesian methods increasingly used to borrow information from adult data

— Age-specific hazards often non-proportional → stratified or time-varying coefficient models

— Frequently included as adjusted covariates but are social constructs, not biological mechanisms

— Race-stratified HRs may reflect structural inequities, healthcare access, or unmeasured confounding rather than biology

— Recent guidelines (NEJM, JAMA) discourage including race as a biological variable without clear justification

Board pearl: When a trial reports a subgroup HR by race that differs from the overall HR, the mechanism is more likely systemic (access, adherence, comorbidity burden) than biological. Step 3 increasingly tests this nuance under health equity questions.

— Censoring = event hasn't occurred by end of observation (right censoring most common)

— Left truncation = patient enters the risk set after the time origin (delayed entry); standard in pregnancy and registry studies

— Ignoring left truncation overestimates survival

— High early hazard (induction mortality)

— Plateau in long-term survivors (cure fraction)

— Cure models or mixture models more appropriate than standard Cox

Pregnant patients are systematically excluded from most trials reporting Cox HRs:

Pediatric trials:

Race, ethnicity, and socioeconomic status:

Left truncation vs. censoring:

Pediatric oncology survival curves often demonstrate:

In trials enrolling pregnant or pediatric patients, ethics review considers minimal risk, assent vs. consent, and the requirement for direct benefit when above-minimal-risk interventions are studied — relevant to chunk 17.

Complications — Misinterpretation of Hazard Ratios

— Treating HR as a risk ratio: HR is an instantaneous rate ratio; RR is a cumulative probability ratio. They diverge when events are common or follow-up is long.

— "50% reduction" overstatement: HR 0.5 means 50% lower hazard, not 50% of patients spared the outcome

— Ignoring absolute risk: HR 0.5 with 2% → 1% event rate is far less impactful than HR 0.5 with 40% → 20%

— Causal language from observational HRs: adjusted HRs from cohorts are associations, not causal effects, unless rigorous causal methods are applied

— Curves may cross because of true effect reversal (e.g., surgery: high early mortality, long-term benefit)

— A single overall HR averages over time and obscures this — report time-stratified HRs or RMST

— Each subgroup test inflates false-discovery risk

— Pre-specification, adjustment (Bonferroni, false discovery rate), and tests of interaction (not within-subgroup p-values) mitigate this

Key distinction: A subgroup HR with p < 0.05 is not evidence of effect modification — the interaction test p-value is. The classic ISIS-2 satire of subgroup analyses (treatment benefit varies by astrological sign) makes this point memorably.

— Occurs when a period during which the outcome cannot occur is misclassified into the treatment group

— Common in pharmacoepidemiology when treatment start is used as time origin but exposure is defined later

— Solutions: time-dependent covariate Cox, landmark analysis, target trial emulation

— Sicker patients drop out → those remaining are healthier → underestimates true hazard

— Hard to detect, harder to fix; sensitivity analyses (best/worst case imputation) help

Common errors in interpreting HRs that Step 3 will test:

Crossing Kaplan–Meier curves misread as "no effect":

Multiple testing in subgroup analyses:

Immortal time bias:

Informative censoring:

When to Escalate — Recognizing Methodologically Weak HR Evidence

— Single observational study with HR near 1 (e.g., 0.85, CI 0.75–0.96) — small effects in observational data may reflect residual confounding

— No PH assumption testing reported

— Crossing KM curves with a single overall HR reported

— Inadequate follow-up for the natural history of the event

— High loss to follow-up (>20%) or differential loss between groups

— Subgroup-driven conclusions without pre-specification or interaction testing

— Surrogate endpoints (e.g., LDL change) rather than hard clinical outcomes

— Large, well-powered RCT with adequate follow-up

— Pre-specified primary endpoint with HR effect size that is clinically meaningful and statistically significant

— Consistent direction across subgroups and sensitivity analyses

— Replication across multiple trials or meta-analytic confirmation

— Plausible biological mechanism

Step 3 management: Hierarchy of evidence for HR claims: meta-analysis of RCTs > single high-quality RCT > propensity-matched observational cohort > unadjusted observational study. A single observational HR rarely justifies practice change for a low-risk intervention; it almost never justifies it for a high-risk intervention.

— I² statistic quantifies between-study variability

— High I² (>50%) suggests trials are estimating different underlying effects

— Random-effects models acknowledge heterogeneity; fixed-effects assume one true HR

— Look for the forest plot with study-specific HRs and pooled estimate

Step 3 will ask you to critically appraise an HR-based finding and decide whether to act on it. Escalate skepticism (and avoid practice change) when:

Conversely, act on HR evidence when:

Heterogeneity in meta-analysis:

Consult biostatistics expertise when designing a survival study, interpreting borderline-significant results, or when complex censoring patterns are present.

Key Differentials — Same-Category Effect Measures

— Log-rank test: nonparametric test of equality of survival curves; provides p-value but no effect size

— Wilcoxon (Breslow) test: similar to log-rank but weights early events more

— Tarone–Ware test: intermediate weighting

— Stratified log-rank: adjusts for a categorical stratifier

— IRR comes from Poisson regression assuming constant hazard; HR comes from Cox without that assumption

— Numerically similar when hazards are roughly constant over time

— OR from logistic regression on a binary outcome at a fixed time ignores time-to-event information and censoring

— Use logistic only when timing is unimportant or unavailable

Key distinction: HR ≈ RR when events are rare (<10% cumulative incidence) and follow-up is uniform. As event rates climb, HR and RR diverge, with HR typically further from the null. Don't substitute one for the other in counseling.

— Mean event-free time up to a clinically relevant horizon (e.g., 5 years)

— Robust to PH violations

— Increasingly reported in oncology trials with non-proportional hazards (e.g., immunotherapy trials with delayed separation)

— Used in composite endpoints (CV death + HF hospitalization)

— Compares hierarchical events pairwise; HR alternative in trials prioritizing fatal over non-fatal events

— EMPULSE, PARAGLIDE-HF used this approach

Cox HR has methodologic siblings within survival analysis:

Hazard ratio vs. incidence rate ratio (IRR):

Hazard ratio vs. odds ratio for survival:

Restricted mean survival time (RMST) difference:

Median survival ratio: ratio of median survival times between groups; intuitive but only defined when both groups reach median

Win ratio:

Key Differentials — Other-Category Statistical Methods

— Logistic regression: binary outcome, no time element, reports OR

— Linear regression: continuous outcome, reports β coefficients

— Poisson regression: count outcomes or rates with person-time denominator, reports IRR

— Negative binomial regression: overdispersed counts (recurrent HF hospitalizations)

— GEE/mixed models: correlated/longitudinal data, repeated measurements

— Target trial emulation: design observational analyses to mimic a hypothetical RCT

— Marginal structural models with inverse probability weighting: handle time-varying confounding

— G-methods: adjust for time-varying confounders affected by prior treatment

— Instrumental variables: address unmeasured confounding under strict assumptions

— Random survival forests, deep survival neural networks, gradient-boosted Cox

— Useful for prediction; interpretation of "hazard" effects more complex

— Increasingly used in risk prediction tools (e.g., MAGGIC, MELD-derivatives)

Board pearl: When a stem describes a "risk calculator" (Framingham, MELD, MAGGIC, PROMISE), recognize these are typically built on Cox regression coefficients translated into a points-based score. Understanding HR interpretation is foundational to understanding these tools.

Beyond survival analysis, distinguish Cox from broader epidemiologic tools:

Causal inference frameworks increasingly used alongside Cox:

Machine learning for survival:

Net reclassification index (NRI) and integrated discrimination improvement (IDI): metrics for whether adding a new variable to a Cox-based risk model meaningfully improves risk stratification

C-statistic (Harrell's c): discrimination measure for survival models (analogous to AUC for logistic models); values 0.5 (chance) to 1.0 (perfect)

Calibration plots: predicted vs. observed event probability — essential for clinical risk score validation

Long-Term Plan — Applying HR Evidence to Practice

— Number needed to treat (NNT) over a clinically relevant horizon, derived from absolute risks

— Time-to-benefit: how long until cumulative benefit accrues — critical when life expectancy is limited

— Time-to-harm: when adverse effects manifest (e.g., bleeding with anticoagulants is early; cancer signals with biologics may be late)

— Shared decision-making tools incorporating both relative and absolute estimates

— Strong recommendations typically require HR with tight CI, replicated across trials, and clinically meaningful absolute benefit

— Weak/conditional recommendations may rest on single trials or observational data

— Class of recommendation (I, IIa, IIb, III) and level of evidence (A, B, C) explicitly grade HR-supported claims

— HRs for surrogates (LDL, A1c, viral load) require validation that change in surrogate maps to change in clinical outcome

— Many drugs with favorable surrogate HRs (e.g., CETP inhibitors on LDL) failed on hard outcomes

Step 3 management: When counseling a 75-year-old with limited life expectancy about starting a statin for primary prevention, the relevant question is whether time-to-benefit (typically 2.5–5 years for cardiovascular outcomes based on HRs from primary prevention trials) is shorter than expected remaining life. If not, deprescribe or do not initiate.

— Antiplatelets, statins, beta-blockers post-MI: HRs 0.7–0.85 for recurrent MACE

— Anticoagulation in AF: HRs ~0.35 for stroke

— Apply guideline-recommended therapies adjusted for individual bleeding/adverse-event risk

Translating HR evidence into longitudinal care:

Guideline-based incorporation:

Surrogate vs. clinical endpoints:

Secondary prevention decisions almost universally rest on HRs from RCTs:

Deprescribing as a longitudinal skill: as life expectancy shortens or competing risks rise, the relative benefit (HR) may persist but the absolute benefit shrinks below the threshold that justifies continued therapy.

Follow-Up — Monitoring HR Evidence Over Time

— Replication trials confirm or refute initial findings

— Meta-analyses pool HRs across studies

— Real-world evidence (registries, EHR-based studies) tests whether trial HRs translate to practice

— Efficacy HR from RCTs reflects ideal conditions

— Effectiveness HR in real-world cohorts often attenuated due to adherence, comorbidity, age extremes

— Recognize when generalizing a trial HR to a different population may overstate benefit

— FDA post-marketing surveillance detects rare adverse events not captured in trial-period HRs

— Sentinel network and observational HRs complement trial data

— Adaptive trials update HR estimates as data accumulate

CCS pearl: When following a patient on a long-term therapy initiated based on trial HRs, monitor for both the intended benefit (e.g., absence of recurrent MACE) and delayed harms (bleeding, malignancy signals, drug-drug interactions). Periodically reassess whether the evidence base has shifted — new trials may show class superiority or safety signals.

— Communicate uncertainty in HR estimates (confidence intervals, generalizability)

— Frame absolute risks honestly using natural frequencies

— Avoid "the drug reduces your risk by 30%" without absolute context

— Cardiac rehab post-MI: HR ~0.74 for all-cause mortality

— Pulmonary rehab in COPD: improved survival and QoL

— Adherence to these interventions amplifies pharmacologic HR benefits

Evidence evolves; HRs from a single trial are point estimates that update:

Effectiveness vs. efficacy gap:

Continuous monitoring frameworks:

Patient counseling on evolving evidence:

Rehabilitation and behavioral interventions with HR evidence:

Documentation: in EHR notes, cite the basis for therapy choices (guideline class, key trial), allowing transitions of care providers to understand the rationale — directly relevant to patient safety in chunk 17.

Ethical, Legal, and Patient Safety Considerations

— Informed consent: patients must understand both relative and absolute benefits/harms. Presenting only HR ("25% reduction") without absolute terms is arguably misleading and can undermine valid consent.

— Equitable trial enrollment: HRs derived from underrepresented populations (women, racial/ethnic minorities, elderly, rural) may not apply to those groups, creating an equity gap

— Conflict of interest: industry-sponsored trials may emphasize favorable HRs in marketing; clinicians should access primary data and independent appraisal

— Data integrity and fraud: HRs in fabricated datasets have led to retraction of major papers; rely on guideline syntheses rather than single studies

— A patient discharged on therapies initiated based on inpatient trial evidence (e.g., post-MI quadruple therapy) is vulnerable to dropoff at the outpatient handoff

— Medication reconciliation at every transition is a Joint Commission patient safety priority

— Document indications and target durations (e.g., DAPT 12 months post-DES) so subsequent providers understand the rationale

Step 3 management: When discharging a post-MI patient, explicitly document: (1) the guideline-directed therapies started, (2) the planned duration based on trial evidence (e.g., DAPT for 12 months per HR from DAPT-duration trials), and (3) the follow-up plan to reassess. This closes the loop and prevents both premature discontinuation and indefinite continuation beyond evidence-supported durations.

— Early stopping of trials for benefit (based on interim HR analyses) can overestimate effect size

— IRB and DSMB oversight ensures trials are stopped only when ethical equipoise is broken

— Pre-registration on ClinicalTrials.gov reduces selective outcome reporting

Several ethical issues intersect with HR-based decision-making:

Transition-of-care risk:

Research ethics:

Mandatory reporting intersects when serious adverse events identified post-trial alter the HR profile — clinicians have a duty to report through FDA MedWatch

Publication bias systematically inflates pooled HRs in meta-analyses if negative trials remain unpublished; trial registration and journal commitments to negative results mitigate this

High-Yield Associations and Rapid-Fire Facts

— Cox model = semiparametric: no assumption on baseline hazard shape

— HR > 1 = harm; HR < 1 = benefit; HR = 1 = null

— CI crossing 1 = not statistically significant

— Log-rank test = nonparametric comparison of survival curves; provides p but no effect size

— Proportional hazards assumption: HR constant over time

— Schoenfeld residuals: PH diagnostic

— Kaplan–Meier curve: stepwise survival; ticks = censoring

— Numbers at risk table: judge curve reliability over time

— Competing risks: use Fine-Gray, not standard Cox

— Immortal time bias: classic pharmacoepidemiology pitfall

— Median survival "not reached": good news when curve never hits 0.5

— ≥10 events per covariate: rule of thumb against overfitting

— NNT = 1/ARR: translate HR into absolute terms

— Interaction p-value: test of effect modification (not within-subgroup p)

— C-statistic / Harrell's c: discrimination for survival models

— RMST: robust to PH violations

Board pearl: If a Step 3 stem mentions "the hazard ratio was 0.78 (95% CI 0.65–0.93)" without telling you the absolute event rates, the test is often checking whether you can correctly state "lower rate of the event" and recognize statistical significance — not whether you can compute NNT. Don't overthink.

— EMPA-REG, DAPA-HF, EMPEROR (SGLT2i, HF outcomes)

— PARADIGM-HF (sacubitril/valsartan)

— ISCHEMIA, COURAGE (PCI vs. medical therapy in stable CAD)

— JUPITER (rosuvastatin primary prevention)

— ORBITA (sham-controlled PCI)

— RE-LY, ROCKET-AF, ARISTOTLE, ENGAGE-AF (DOACs)

— FOURIER, ODYSSEY (PCSK9 inhibitors)

Rapid recall facts for Cox/HR on Step 3:

Landmark trials whose HRs are frequently cited on boards:

Each trial's primary endpoint is typically a time-to-event composite analyzed via Cox regression.

Board Question Stem Patterns

"A randomized trial of Drug X vs. placebo in patients with HFrEF reports an HR of 0.74 (95% CI 0.62–0.88) for the composite of CV death and HF hospitalization. Which statement best describes this finding?"

— Correct: 26% lower instantaneous rate of the event; statistically significant

— Distractors: 26% of patients spared the event (no — that conflates HR with absolute risk); not significant (no — CI excludes 1.0)

"Kaplan–Meier curves for surgical vs. medical therapy cross at 6 months, with surgery showing higher early mortality and lower late mortality. The overall HR is 0.95 (95% CI 0.80–1.12). What is the best interpretation?"

— Correct: PH assumption violated; HR averaged over time obscures true time-varying effect; report time-stratified analysis or RMST

"In a trial of an antiplatelet agent in 85-year-olds, the primary endpoint is time to MI. 20% of patients died of non-cardiac causes during follow-up. Which analysis is most appropriate?"

— Correct: Fine-Gray subdistribution hazard model

"Drug Y's HR for mortality is 0.55 in patients with diabetes and 0.95 in non-diabetics. The interaction p-value is 0.01. The best interpretation is…"

— Correct: effect modification by diabetes status; report stratified HRs

Step 3 management: Look for buzzwords that anchor the answer: "Schoenfeld residuals" → PH testing; "time-dependent covariate" → variable changes over time; "Fine-Gray" → competing risks; "log-rank" → curve comparison without effect size; "RMST" → PH-free summary.

"Trial reports HR 0.70 with control 3-year event rate of 10% and treatment 3-year event rate of 7%. What is the NNT?"

— ARR = 3%; NNT ≈ 33 over 3 years

Stem gives observational study HR of 0.92 with wide CI in a small cohort; correct answer: insufficient evidence to change practice, residual confounding likely.

Pattern 1 — HR interpretation:

Pattern 2 — PH violation:

Pattern 3 — Competing risks:

Pattern 4 — Confounding vs. effect modification:

Pattern 5 — Translating to NNT:

Pattern 6 — Critical appraisal:

One-Line Recap

Cox proportional hazards regression estimates the hazard ratio — a time-to-event effect measure interpreted as the relative rate of an outcome between groups — provided the proportional hazards assumption holds, censoring is non-informative, and absolute risks are always considered alongside the HR for clinical decision-making.

Board pearl: On Step 3, the hazard ratio is less about the math and more about the interpretation — recognize the model, interpret the HR with its CI, check the assumptions, contextualize with absolute risks, and tie it to a real management decision at the bedside or in the clinic.

Recap bullet 1 — Interpretation: HR < 1 = protective, HR > 1 = harmful, CI crossing 1 = non-significant; HR ≠ RR, especially when events are common or follow-up is long. Always translate HR to absolute risk and NNT for patient counseling.

Recap bullet 2 — Assumptions: Proportional hazards (constant HR over time, tested via Schoenfeld residuals or log-log plots), non-informative censoring, and ≥10 events per covariate. PH violation → stratified Cox, time-varying coefficients, or RMST.

Recap bullet 3 — Pitfalls: Competing risks (use Fine-Gray), immortal time bias (use time-dependent Cox or landmark analysis), informative censoring, subgroup misinterpretation (use interaction p-value, not within-subgroup p), and surrogate-endpoint extrapolation.

Recap bullet 4 — Application: Strong HR evidence from well-conducted RCTs underpins most guideline-directed therapies; weak observational HRs require skepticism. Communicate both relative and absolute estimates during informed consent. Reassess therapies as life expectancy and competing risks evolve, particularly in elderly or frail patients where time-to-benefit may exceed time-to-harm.