Biostatistics & Population Health
Hazard ratio vs odds ratio vs risk ratio: clinical use
— Risk ratio (RR): cumulative incidence ratio — used in cohort studies and RCTs with fixed follow-up. Directly interpretable as "X times the risk."
— Odds ratio (OR): ratio of odds of exposure — the only valid effect measure in case-control studies because incidence cannot be calculated when sampling on outcome.
— Hazard ratio (HR): instantaneous event rate ratio over time — output of Cox proportional hazards or survival analyses. Required when follow-up varies or censoring occurs (death, loss to follow-up, study end).
— A case-control study reporting "RR" → almost certainly mislabeled OR.
— A cohort study with variable follow-up reporting only "RR" → may obscure time-dependent effects; HR preferred.
— An OR reported for a common outcome (>10%) and interpreted as RR → systematic overestimation of effect magnitude.

— "In a randomized trial of 4,000 patients followed for 2 years, the event rate was 8% in the drug group vs 12% in placebo. RR = 0.67."
— Fixed follow-up window, both arms followed equally, simple 2×2 incidence table.
— Often paired with NNT calculation: NNT = 1/ARR = 1/(0.12−0.08) = 25.
— "Researchers identified 200 patients with pancreatic cancer and 400 matched controls; prior diabetes was more common in cases (OR 2.1)."
— Retrospective, outcome-defined sampling, "matched controls," "history of exposure ascertained."
— Logistic regression output is always an OR, even in cohort data — common Step 3 distractor.
— "Over a median follow-up of 4.6 years, the HR for cardiovascular death was 0.78 (95% CI 0.65–0.93)."
— Mention of Kaplan-Meier curves, median follow-up, censoring, time-to-event, or Cox model.
— Variable follow-up between subjects (e.g., rolling enrollment, dropouts).
— Patient asks "Doc, should I take this statin?" → don't say "HR 0.75"; say "out of 100 people like you over 5 years, about 3 fewer will have a heart attack."

— Value of 1.0 = no effect. <1.0 = protective. >1.0 = harmful (or increased odds/risk/hazard of outcome).
— Magnitude alone is meaningless without CI and baseline risk.
— If the CI crosses 1.0, the result is not statistically significant at α=0.05.
— Width matters: HR 0.80 (0.78–0.82) is precise; HR 0.80 (0.40–1.60) is uninformative despite identical point estimate.
— Wide CI → small sample, rare event, or high heterogeneity.
— A 50% RR reduction on a 0.2% baseline = ARR 0.1%, NNT 1,000.
— Same 50% RR reduction on a 20% baseline = ARR 10%, NNT 10.
— Relative measures exaggerate; absolute measures clinically anchor.
— Cox HR assumes the ratio of hazards is constant over time. If KM curves cross or diverge late, the HR is a time-averaged distortion.
— Look for landmark analyses or time-stratified HRs as evidence the authors checked this.
— When outcome prevalence >10%, OR overstates RR. Example: outcome 30% in exposed, 20% in unexposed → RR = 1.5, but OR = (0.30/0.70)/(0.20/0.80) = 1.71.

| • Master the 2×2 table; Step 3 will give you raw counts and expect a calculation. | ||
| Outcome + | Outcome − | |
| Exposed | a | b |
| Unexposed | c | d |
| • Risk (incidence) in exposed: a/(a+b). Risk in unexposed: c/(c+d). | ||
| — RR = [a/(a+b)] / [c/(c+d)] | ||
| — ARR (absolute risk reduction) = [c/(c+d)] − [a/(a+b)] when exposure is protective. | ||
| — NNT = 1/ARR, rounded up. | ||
| • Odds in exposed: a/b. Odds in unexposed: c/d. | ||
| — OR = (a×d) / (b×c) — the cross-product ratio. Memorize this; it works regardless of study design orientation. | ||
| • Hazard ratio: cannot be hand-calculated from a 2×2 alone because it incorporates person-time and timing of events. Reported from Cox regression. | ||
| — Simplified rate ratio approximation: (events_exposed / person-years_exposed) / (events_unexposed / person-years_unexposed). | ||
| • Worked example — RCT of new anticoagulant: | ||
| — Drug arm: 40 strokes / 1000 patients = 4% risk. | ||
| — Placebo: 80 strokes / 1000 patients = 8% risk. | ||
| — RR = 0.04/0.08 = 0.50 (50% relative risk reduction). | ||
| — ARR = 0.04, NNT = 25. | ||
| — OR = (40×920)/(960×80) = 36,800/76,800 = 0.48 — close to RR because outcome is uncommon. | ||
| • Worked example — case-control of MI and NSAID use: | ||
| — Cases (MI): 60 NSAID users, 140 non-users. | ||
| — Controls: 30 NSAID users, 170 non-users. | ||
| — OR = (60×170)/(140×30) = 10,200/4,200 = 2.43. RR cannot be calculated. | ||
| • Board pearl: Step 3 frequently asks "what is the best measure of association in this study?" — answer is dictated by design, not by what feels intuitive. | ||
| • CCS pearl: when a journal club question appears on CCS-style feedback, always calculate ARR/NNT before discussing relative effect — this is the framing residency program directors expect. |

— For ratio measures, CI is asymmetric on the linear scale but symmetric on the log scale. That's why CIs look like 0.5–2.0 rather than centered arithmetically.
— A CI excluding 1.0 corresponds to p < 0.05 (for the two-sided test against the null of no effect).
— p-value tells you whether an effect differs from null; CI tells you how big and how precise.
— Step 3 favors CI-based reasoning because it integrates clinical significance.
— Adjusted HR (aHR), aOR, aRR: controlled for confounders listed in the model. Always prefer adjusted estimates for causal interpretation, but check which covariates were included — unmeasured confounding still possible.
— A large drop from crude to adjusted estimate signals important confounding (e.g., crude HR 2.5 → aHR 1.1 means the apparent effect was mostly confounding).
— Confounding → adjusted ≠ crude estimate; report adjusted.
— Effect modification (interaction) → effect differs across strata; report stratum-specific estimates, not a single pooled one.
— Example: aspirin's HR for bleeding differs in patients <65 (HR 1.2) vs ≥75 (HR 2.8) — age is an effect modifier, not just a confounder.
— Schoenfeld residuals, log-log plots, or time-interaction terms — if PH violated, report time-stratified HRs or use restricted mean survival time (RMST).

— Cohort or RCT → RR or HR.
— Rare disease → case-control with OR (only feasible design).
— Time-to-event outcome, censoring → HR from Cox model.
— Fixed endpoint (e.g., 30-day mortality) → RR or risk difference.
— Always translate to absolute risk reduction and NNT using the patient's baseline risk.
— Use validated risk calculators (ASCVD pooled cohort, CHA₂DS₂-VASc, FRAX) for baseline, then apply the trial's relative effect.
— Cross-sectional or case-control → OR.
— Cohort → RR or HR.
— Patients understand "out of 100 people like you" better than ratios. Convert HR/RR to natural frequencies.
— Example: statin for primary prevention, baseline 10-yr ASCVD risk 10%, RR 0.75 → 2.5 fewer events per 100 over 10 years.
— Rare adverse events (e.g., rhabdomyolysis on statin, RR 2.0 but ARR 0.01%).
— Common outcomes in high-risk patients (e.g., recurrent MI in post-ACS, even small RR reductions yield large NNT benefit).

Treat each measure as a "drug" with specific indications, contraindications, and side effects.
— Indication: RCTs, prospective cohorts with complete follow-up, cumulative incidence comparisons.
— Advantage: intuitive ("twice the risk"), directly translatable to ARR/NNT.
— Contraindication: case-control studies, variable follow-up, censoring.
— Side effect: can mislead when baseline risk varies wildly across populations.
— Indication: case-control studies, logistic regression for binary outcomes, meta-analyses combining mixed designs.
— Advantage: mathematically tractable, symmetric (OR of disease = 1/OR of no disease), works when outcome sampling is fixed.
— Contraindication: interpreting as RR when outcome is common (>10%).
— Side effect: systematic overestimation of effect when outcome is frequent → patients hear "double the odds" and think "double the risk."
— Indication: time-to-event analyses with censoring, varying follow-up, survival outcomes.
— Advantage: uses all available follow-up time, handles dropouts via censoring.
— Contraindication: non-proportional hazards (curves cross), competing risks ignored.
— Side effect: patients and clinicians often interpret HR as if it were RR; over short time horizons they roughly agree, over long horizons they can diverge.
— Incidence rate ratio (IRR): Poisson regression, count outcomes per person-time.
— Subdistribution hazard ratio (Fine-Gray): competing risks (e.g., cardiac death in cancer patients).
— Restricted mean survival time (RMST) ratio: when PH violated.

— RR and HR pool naturally on the log scale; OR is the default for fixed-effect Mantel-Haenszel.
— Heterogeneity (I²): >50% suggests caution; consider random-effects model and explore subgroups.
— Forest plots: each study's CI; diamond = pooled estimate. Funnel plot asymmetry → publication bias.
— From RCT: NNT = 1/ARR. Round up (NNT 24.3 → 25).
— From HR over time t: ARR ≈ baseline risk × (1 − HR^t/follow-up) — approximation.
— From OR with rare outcome: NNT ≈ 1/[baseline risk × (1 − OR)].
— Primary prevention: NNT often 50–200 acceptable for low-risk, low-cost interventions (statins, BP control).
— Secondary prevention: NNT 10–30 typical (DAPT post-ACS, β-blocker post-MI).
— High-cost or high-toxicity therapy: NNT <20 usually required (e.g., novel oncologics, biologics) plus quality-of-life consideration.
— Pretest probability + likelihood ratio → posttest probability. Effect measures inform prior beliefs about therapeutic benefit.
— Computing NNT from RRR alone without baseline risk → impossible; always need absolute numbers.
— Confusing incidence (new cases / population at risk) with prevalence (existing cases / total population).

— Baseline risk is higher, so the same RR or HR yields a much larger ARR and smaller NNT — therapies that look marginal in young trial populations may be highly worthwhile in older patients (e.g., statins for primary prevention in 70-year-olds with elevated risk).
— Conversely, competing risks (death from non-target cause) reduce realized benefit; HR may overstate true patient-perceived benefit.
— Many landmark trials exclude patients >75 or with comorbidities — external validity is limited. Look for prespecified elderly subgroup estimates.
— Trials often exclude CKD stage 4–5 (eGFR <30). Effect estimates do not extrapolate.
— Competing risk of dialysis or renal death attenuates time-to-event endpoints — Fine-Gray subdistribution HR is more appropriate than Cox.
— Affects pharmacokinetics, not directly the statistical measure, but trial exclusions mean reported HRs/RRs don't apply to Child-Pugh B/C patients.
— A frail 85-year-old's time horizon to benefit may be shorter than the trial follow-up. If a statin's HR for CV death plays out over 5 years and life expectancy is 3 years, the patient may never realize the benefit.
— Use "time to benefit" literature: e.g., statins ~2.5 years, tight glycemic control ~8 years, mammography ~10 years.
— Reported HRs are usually averages; in patients on 10+ medications, drug-drug interactions can shift the realized HR substantially.

— Almost universally excluded from RCTs → effect estimates derive from observational cohorts and registries, typically reported as adjusted OR or adjusted RR.
— Confounding by indication is severe (sick patients get treated). Example: SSRIs in pregnancy and persistent pulmonary hypertension of newborn — early ORs ~6, later registry-based aHRs ~1.5 after better confounder control.
— Step 3 framing: report ORs from pharmacoepidemiology cautiously; emphasize absolute baseline risk (often <1%) when counseling.
— Small sample sizes → wide CIs, less precise HRs.
— Outcomes are often rare → OR ≈ RR is usually valid.
— Off-label use common; extrapolating adult HRs to children assumes shared pathophysiology — frequently wrong (e.g., antidepressants and suicidality risk inverted in adolescents).
— Subgroup HRs are usually underpowered and exploratory unless prespecified with interaction testing.
— Recent guidelines (ASCVD, eGFR) have removed race coefficients because they reflected historical confounding, not biology.
— Always check whether the trial population reflects your patient demographically — external validity, not just internal validity.
— Women historically underrepresented in cardiovascular trials → HRs for therapies like ICDs in HFrEF may not apply equally.
— FDA now requires sex-stratified subgroup reporting; look for interaction p-values.
— Baseline risks differ markedly; relative measures may transport, absolute benefit (ARR/NNT) does not.

— "50% reduction in stroke" sounds compelling. If baseline 5-year risk is 1%, ARR is 0.5%, NNT 200 — the relative number drives prescribing far beyond what absolute benefit warrants.
— Consequence: polypharmacy, adverse events, cost burden without commensurate benefit.
— Postoperative delirium in elderly hip fracture patients: outcome ~30%, OR 2.5 for sedative exposure. Clinician hears "2.5× the risk" → actual RR ~1.8. Counseling and policy decisions inflated.
— A constant HR of 0.7 over 10 years does not mean 30% fewer events at year 10 — it means 30% lower instantaneous hazard at any time. Cumulative ARR depends on baseline cumulative incidence.
— Crossing KM curves: an early benefit may reverse later (e.g., some chemotherapies). Reporting a single HR masks this.
— Cox HR for stroke in elderly anticoagulated AF patients overestimates absolute stroke reduction when many patients die of other causes first. Use Fine-Gray subdistribution HR or report cumulative incidence functions.
— In observational data, sicker patients get more aggressive therapy → apparent harm. Spurious HR >1 for the therapy. Propensity score methods help but don't fully resolve.
— Counting follow-up time before treatment exposure as "exposed" — creates artifactual protective HR (<1). Common in hospital pharmacoepi.
— Testing 20 subgroups at α=0.05 → expect 1 false positive. Subgroup HRs without interaction testing are unreliable.
— Misread effect measure → inappropriate prescribing → adverse event → litigation risk.

— Designing a QI project, residency research, or pre-protocol review — pick the right measure before collecting data.
— Encountering non-proportional hazards, competing risks, or recurrent events.
— Performing or interpreting meta-analysis with heterogeneity or suspected publication bias.
— When a guideline recommendation rests on a single trial with borderline CI (e.g., HR 0.85, CI 0.72–1.00) — discuss whether to adopt locally.
— When subgroup HRs are being used to justify expanded indications without prespecified interaction tests.
— Stopping for benefit: prespecified efficacy boundary (O'Brien-Fleming, Haybittle-Peto) crossed at interim analysis. Risks overestimating effect size because trials that stop early tend to be on a favorable random walk.
— Stopping for futility: conditional power low; spares patients further exposure.
— Stopping for harm: safety boundary crossed; ethical obligation to halt.
— Patient outside trial inclusion (age, comorbidity, severity).
— Outcome differs from trial endpoint (composite endpoint vs the one you care about).
— Time horizon mismatch.
— Local population baseline risk substantially different.
— Payers increasingly demand cost-effectiveness analyses built on HRs and NNTs. Understanding these measures is now part of value-based care competency.

— Subtractive, not ratio. RD = risk_exposed − risk_unexposed.
— Best for clinical decision-making; basis for NNT.
— Limitation: depends heavily on baseline risk, doesn't transport across populations as well as ratio measures.
— Ratio of incidence rates (events per person-time).
— Used in Poisson or negative binomial regression; appropriate for recurrent events (e.g., COPD exacerbations per patient-year).
— Distinct from RR (cumulative incidence) and HR (instantaneous hazard).
— Observed deaths / expected deaths in a reference population. Used in occupational and registry epi.
— Rate ratio assumes constant hazard; HR allows time-varying baseline hazard but assumes proportionality between groups.
— Accounts for competing risks; gives cumulative incidence interpretation directly.
— Conditional OR/HR: effect within strata of covariates (regression output).
— Marginal OR/HR/RR: population-averaged effect (standardization, IPTW).
— For binary outcomes with common events, conditional and marginal ORs differ — important for policy decisions.
— Average event-free time over a defined horizon. Robust when PH is violated. Increasingly reported in oncology.

— Third variable associated with both exposure and outcome, distorting the measured effect.
— Addressed by randomization (RCT), restriction, matching, stratification, multivariable adjustment, propensity scores, instrumental variables.
— Residual unmeasured confounding always possible in observational studies.
— Berkson's bias in hospital-based case-control.
— Loss to follow-up differential between exposure groups → biased HR.
— Healthy worker effect.
— Recall bias: cases remember exposures differently than controls — inflates OR in case-control.
— Misclassification: non-differential biases toward null; differential can bias either direction.
— Cross-sectional and case-control particularly vulnerable. Example: low cholesterol "causing" cancer when subclinical cancer actually lowers cholesterol.
— Inferring individual-level associations from group-level data.
— RCT > prospective cohort > retrospective cohort > case-control > cross-sectional > case series.
— Mendelian randomization and target trial emulation are modern bridges.

— Always ask: what's the patient's baseline risk? What's the relative effect (RR/HR)? What's the ARR and NNT? What's the NNH?
— Translate ratios into natural frequencies for patient counseling: "out of 100 people like you over 5 years…"
— Record the shared decision-making conversation including both relative and absolute risk discussion.
— Note where the patient's profile aligns or diverges from the trial population cited.
— Subscribe to evidence-summary services (NEJM Journal Watch, ACP Journal Club) that consistently present ARR/NNT alongside HRs.
— Use clinical decision tools that auto-calculate absolute benefit (e.g., MDCalc's ASCVD risk + statin benefit estimator).
— When monitoring local outcomes (e.g., readmission rates), use risk-adjusted observed/expected ratios — analogous to SMR.
— When implementing a new protocol, prespecify the effect measure and threshold for adoption before reviewing data.
— Don't quote a pharma rep's RRR figure without computing ARR for your patient.
— Don't apply a trial HR to a patient whose baseline risk is 5× higher or lower than the trial mean without recalculating expected absolute benefit.
— Don't use OR from a common-outcome logistic regression as if it were RR.
— Refresh biostatistics annually — at minimum review one critical appraisal per month with focus on effect measure interpretation.

— Periodic chart review: are shared decision-making notes citing absolute risk?
— Are you re-evaluating chronic preventive therapies as patient risk profile changes (e.g., aspirin in primary prevention as patient ages)?
— Guidelines evolve (e.g., aspirin primary prevention 2022 USPSTF reversal for older adults; tight glycemic control after ACCORD).
— Re-counsel patients when foundational evidence changes; document the update.
— Use pictographs (100-person icon arrays) for absolute risk — proven to improve patient comprehension over ratios alone.
— Avoid framing effects: present gains and losses symmetrically ("90 of 100 won't have an event without treatment, 93 of 100 won't with treatment" vs "3% absolute reduction").
— Address numeracy explicitly; many patients struggle with percentages.
— For lifestyle interventions, effect sizes are often small (HR 0.85–0.95) but cumulative and synergistic with pharmacotherapy.
— Track adherence; trial HRs assume per-protocol adherence rarely seen in practice.
— Long-term follow-up of RCTs sometimes shows divergence from initial HRs (e.g., delayed cardiovascular benefit, delayed harm).
— Post-marketing surveillance (FAERS, registry data) provides real-world effect estimates that may differ from premarketing trials.
— Cardiac rehab post-MI: HR for mortality ~0.74; baseline 10-yr mortality ~25% → ARR ~6%, NNT ~17 — strongly worthwhile, emphasize this.
— Diabetes prevention program: lifestyle vs metformin RR for incident DM ~0.42 vs 0.69 over 3 years — lifestyle wins on relative and absolute.

— Legally and ethically, informed consent requires disclosure of material risks and benefits. Quoting only RRR ("50% reduction") without ARR/NNT may meet the letter of consent but violates its spirit and increasingly its legal standard in some jurisdictions.
— Concrete example: patient signs consent for statin therapy after being told it "halves heart attack risk" — actual ARR over 5 years for low-risk primary prevention is ~1%. If patient later experiences adverse effect (rhabdomyolysis, new diabetes) and learns of inflated framing, both trust and legal defensibility erode.
— Discharge summaries should specify the absolute benefit expected from each new chronic medication to support continuity decisions by the receiving clinician.
— Failure to translate evidence into patient-specific terms is a documented patient-safety gap.
— Public health agencies use SMRs and rate ratios for surveillance; clinicians contributing data have obligations to accurate reporting (e.g., cancer registries, communicable disease).
— Equipoise: an RCT is ethical when the effect (HR/RR) is genuinely uncertain. Continued randomization after a DSMB-confirmed efficacy boundary crossing violates equipoise.
— Stopping rules protect participants; understanding why a trial stopped early matters for interpretation and consent in subsequent trials.
— Industry-sponsored trials more often report favorable RRRs prominently. Recognize and counterbalance with independent appraisal.
— Applying HRs derived from non-representative populations to underrepresented patients can perpetuate disparities. Document the gap.
— A 78-year-old on six chronic preventive medications, each justified by HRs from trials excluding patients >75. Cumulative anticholinergic burden, hypotension, falls. Deprescribing review is an ethical obligation grounded in honest effect-measure reassessment for the individual patient.


— Stem: case-control study of NSAIDs and GI bleed, 2×2 table provided. Question: "What is the most appropriate measure of association?" Answer: odds ratio (RR is invalid because outcome-defined sampling).
— Stem provides cohort data; asks to compute RR or NNT. Plug into formulas: RR = (a/(a+b)) / (c/(c+d)); NNT = 1/ARR.
— Stem reports OR 2.5 for postoperative delirium (30% outcome). Question: "What is the most important caveat in counseling the patient?" Answer: OR overestimates RR when outcome is common.
— HR 0.78 (95% CI 0.65–0.93). Question: significance and clinical meaning. Answer: statistically significant (CI excludes 1), 22% relative hazard reduction, must compute ARR using baseline risk for patient counseling.
— Stem reports unfavorable overall HR but favorable subgroup HR. Question: "Should you offer therapy to this subgroup patient?" Answer: usually no without prespecified significant interaction.
— Patient is 82 with CKD; trial enrolled patients 50–70 with normal renal function. Question on applying the HR. Answer: recognize limited generalizability, engage in shared decision-making.
— Observational study shows therapy associated with increased mortality (HR 1.4). Question: most likely explanation. Answer: confounding by indication (sicker patients got therapy); recommend RCT before changing practice.
— Frail 88-year-old with 2-year life expectancy; statin benefit emerges at 2.5 years. Question: appropriate management. Answer: do not initiate (time to benefit exceeds life expectancy).
— Trial stopped early for benefit, HR 0.45. Question: caveat. Answer: likely overestimates effect size; await confirmatory data.
— Patient asks "how much will this help me?" Best response converts RR/HR into absolute terms using natural frequencies.

Choose the effect measure by study design — RR for cohorts/RCTs, OR for case-control or logistic regression, HR for time-to-event analyses — then always translate the ratio into absolute risk and NNT for the individual patient sitting in front of you.
— Cohort/RCT with fixed follow-up → risk ratio + ARR + NNT.
— Case-control or any logistic regression → odds ratio (never call it RR if outcome is common).
— Time-to-event with censoring (Cox model, Kaplan-Meier) → hazard ratio, check proportional hazards and competing risks.
— Point estimate alone is meaningless; pair with 95% CI, baseline risk, and absolute translation.
— CI excludes 1 → statistically significant; CI tightness reflects precision.
— OR ≈ RR only when outcome < 10%; otherwise OR exaggerates.
— Convert every relative effect into natural frequencies ("3 fewer events per 100 over 5 years").
— Weigh against NNH, life expectancy, and time to benefit before initiating chronic therapy.
— Document shared decision-making explicitly.
— Confounding by indication, immortal time bias, subgroup multiplicity, stopped-early trial overestimation, external validity gaps in elderly/CKD/pregnancy/pediatric patients.

