Biostatistics & Population Health

Hazard ratio vs odds ratio vs risk ratio: clinical use

Clinical Overview and When to Suspect Misuse of Effect Measures

— Risk ratio (RR): cumulative incidence ratio — used in cohort studies and RCTs with fixed follow-up. Directly interpretable as "X times the risk."

— Odds ratio (OR): ratio of odds of exposure — the only valid effect measure in case-control studies because incidence cannot be calculated when sampling on outcome.

— Hazard ratio (HR): instantaneous event rate ratio over time — output of Cox proportional hazards or survival analyses. Required when follow-up varies or censoring occurs (death, loss to follow-up, study end).

— A case-control study reporting "RR" → almost certainly mislabeled OR.

— A cohort study with variable follow-up reporting only "RR" → may obscure time-dependent effects; HR preferred.

— An OR reported for a common outcome (>10%) and interpreted as RR → systematic overestimation of effect magnitude.

Why this matters on Step 3: every clinical trial, cohort study, and meta-analysis you read reports effects as hazard ratios (HR), odds ratios (OR), or risk ratios (RR/relative risk). Misinterpreting them changes prescribing, screening, and counseling decisions.

Core distinction by study design:

When to suspect the wrong measure was used:

Step 3 management: when a vignette asks "based on this trial, what do you tell the patient?" first identify the study design in the stem (RCT vs case-control vs registry cohort), then match to the appropriate effect measure. If the stem gives you an HR from a Kaplan-Meier analysis, counsel in terms of relative rate over time, not absolute lifetime risk.

Board pearl: RR and HR both approximate each other when the outcome is rare and the hazard is roughly constant; OR approximates RR only when the outcome is rare (<10%). A "rare disease assumption" failure is the single most common Step 3 trap in this topic.

Anchor your reading: design → measure → interpretation → absolute-risk translation for the patient.

Presentation Patterns and Key History (How Each Measure Appears in Vignettes)

— "In a randomized trial of 4,000 patients followed for 2 years, the event rate was 8% in the drug group vs 12% in placebo. RR = 0.67."

— Fixed follow-up window, both arms followed equally, simple 2×2 incidence table.

— Often paired with NNT calculation: NNT = 1/ARR = 1/(0.12−0.08) = 25.

— "Researchers identified 200 patients with pancreatic cancer and 400 matched controls; prior diabetes was more common in cases (OR 2.1)."

— Retrospective, outcome-defined sampling, "matched controls," "history of exposure ascertained."

— Logistic regression output is always an OR, even in cohort data — common Step 3 distractor.

— "Over a median follow-up of 4.6 years, the HR for cardiovascular death was 0.78 (95% CI 0.65–0.93)."

— Mention of Kaplan-Meier curves, median follow-up, censoring, time-to-event, or Cox model.

— Variable follow-up between subjects (e.g., rolling enrollment, dropouts).

— Patient asks "Doc, should I take this statin?" → don't say "HR 0.75"; say "out of 100 people like you over 5 years, about 3 fewer will have a heart attack."

Step 3 stems rarely say "this is a hazard ratio question." You identify it by how the data are presented.

Risk ratio vignette cues:

Odds ratio vignette cues:

Hazard ratio vignette cues:

History anchor for the patient encounter: when explaining trial results, translate to absolute terms.

Key distinction: RR and HR describe forward-looking risk (cohort, trial); OR in case-control describes backward-looking association and cannot be converted to absolute risk without external incidence data.

Board pearl: if the stem mentions "logistic regression," the reported effect is an OR regardless of whether the study was a cohort — a frequent source of inflated effect estimates for common outcomes like postoperative delirium or readmission.

Physical Exam Findings — Reading the "Anatomy" of a Reported Effect

— Value of 1.0 = no effect. <1.0 = protective. >1.0 = harmful (or increased odds/risk/hazard of outcome).

— Magnitude alone is meaningless without CI and baseline risk.

— If the CI crosses 1.0, the result is not statistically significant at α=0.05.

— Width matters: HR 0.80 (0.78–0.82) is precise; HR 0.80 (0.40–1.60) is uninformative despite identical point estimate.

— Wide CI → small sample, rare event, or high heterogeneity.

— A 50% RR reduction on a 0.2% baseline = ARR 0.1%, NNT 1,000.

— Same 50% RR reduction on a 20% baseline = ARR 10%, NNT 10.

— Relative measures exaggerate; absolute measures clinically anchor.

— Cox HR assumes the ratio of hazards is constant over time. If KM curves cross or diverge late, the HR is a time-averaged distortion.

— Look for landmark analyses or time-stratified HRs as evidence the authors checked this.

— When outcome prevalence >10%, OR overstates RR. Example: outcome 30% in exposed, 20% in unexposed → RR = 1.5, but OR = (0.30/0.70)/(0.20/0.80) = 1.71.

There is no patient exam here, but every reported effect has a structural "exam" you must perform before believing it.

Inspect the point estimate:

Palpate the confidence interval (95% CI):

Auscultate the baseline risk:

Percuss for proportional hazards assumption (HR only):

Special "hemodynamic" check for OR:

Step 3 management: before quoting any effect to a patient or in a QI meeting, do the three-step exam — point estimate, CI, absolute risk translation. If any one is missing or unfavorable, withhold strong recommendation.

Board pearl: a non-significant HR with a tight CI hugging 1.0 (e.g., 0.98–1.04) is strong evidence of no effect, not "insufficient data" — distinguish from wide-CI inconclusive results.

Diagnostic Workup — Computing Each Measure from a 2×2 Table

• Master the 2×2 table; Step 3 will give you raw counts and expect a calculation.
Outcome +	Outcome −
Exposed	a	b
Unexposed	c	d
• Risk (incidence) in exposed: a/(a+b). Risk in unexposed: c/(c+d).
— RR = [a/(a+b)] / [c/(c+d)]
— ARR (absolute risk reduction) = [c/(c+d)] − [a/(a+b)] when exposure is protective.
— NNT = 1/ARR, rounded up.
• Odds in exposed: a/b. Odds in unexposed: c/d.
— OR = (a×d) / (b×c) — the cross-product ratio. Memorize this; it works regardless of study design orientation.
• Hazard ratio: cannot be hand-calculated from a 2×2 alone because it incorporates person-time and timing of events. Reported from Cox regression.
— Simplified rate ratio approximation: (events_exposed / person-years_exposed) / (events_unexposed / person-years_unexposed).
• Worked example — RCT of new anticoagulant:
— Drug arm: 40 strokes / 1000 patients = 4% risk.
— Placebo: 80 strokes / 1000 patients = 8% risk.
— RR = 0.04/0.08 = 0.50 (50% relative risk reduction).
— ARR = 0.04, NNT = 25.
— OR = (40×920)/(960×80) = 36,800/76,800 = 0.48 — close to RR because outcome is uncommon.
• Worked example — case-control of MI and NSAID use:
— Cases (MI): 60 NSAID users, 140 non-users.
— Controls: 30 NSAID users, 170 non-users.
— OR = (60×170)/(140×30) = 10,200/4,200 = 2.43. RR cannot be calculated.
• Board pearl: Step 3 frequently asks "what is the best measure of association in this study?" — answer is dictated by design, not by what feels intuitive.
• CCS pearl: when a journal club question appears on CCS-style feedback, always calculate ARR/NNT before discussing relative effect — this is the framing residency program directors expect.

Diagnostic Workup — Advanced Interpretation: CIs, p-values, and Model Output

— For ratio measures, CI is asymmetric on the linear scale but symmetric on the log scale. That's why CIs look like 0.5–2.0 rather than centered arithmetically.

— A CI excluding 1.0 corresponds to p < 0.05 (for the two-sided test against the null of no effect).

— p-value tells you whether an effect differs from null; CI tells you how big and how precise.

— Step 3 favors CI-based reasoning because it integrates clinical significance.

— Adjusted HR (aHR), aOR, aRR: controlled for confounders listed in the model. Always prefer adjusted estimates for causal interpretation, but check which covariates were included — unmeasured confounding still possible.

— A large drop from crude to adjusted estimate signals important confounding (e.g., crude HR 2.5 → aHR 1.1 means the apparent effect was mostly confounding).

— Confounding → adjusted ≠ crude estimate; report adjusted.

— Effect modification (interaction) → effect differs across strata; report stratum-specific estimates, not a single pooled one.

— Example: aspirin's HR for bleeding differs in patients <65 (HR 1.2) vs ≥75 (HR 2.8) — age is an effect modifier, not just a confounder.

— Schoenfeld residuals, log-log plots, or time-interaction terms — if PH violated, report time-stratified HRs or use restricted mean survival time (RMST).

95% CI mechanics:

p-value vs CI:

Reading a regression table:

Effect modification vs confounding:

Proportional hazards check:

Step 3 management: when counseling a patient about a therapy based on a subgroup analysis, be skeptical: subgroup HRs are underpowered, prone to multiplicity, and unless prespecified with a significant interaction p-value, should be considered hypothesis-generating only.

Board pearl: "Significant interaction p-value" is the magic phrase distinguishing a believable subgroup effect from a spurious one. Without it, treat subgroups as exploratory.

Risk Stratification — Choosing the Right Measure for the Clinical Question

— Cohort or RCT → RR or HR.

— Rare disease → case-control with OR (only feasible design).

— Time-to-event outcome, censoring → HR from Cox model.

— Fixed endpoint (e.g., 30-day mortality) → RR or risk difference.

— Always translate to absolute risk reduction and NNT using the patient's baseline risk.

— Use validated risk calculators (ASCVD pooled cohort, CHA₂DS₂-VASc, FRAX) for baseline, then apply the trial's relative effect.

— Cross-sectional or case-control → OR.

— Cohort → RR or HR.

— Patients understand "out of 100 people like you" better than ratios. Convert HR/RR to natural frequencies.

— Example: statin for primary prevention, baseline 10-yr ASCVD risk 10%, RR 0.75 → 2.5 fewer events per 100 over 10 years.

— Rare adverse events (e.g., rhabdomyolysis on statin, RR 2.0 but ARR 0.01%).

— Common outcomes in high-risk patients (e.g., recurrent MI in post-ACS, even small RR reductions yield large NNT benefit).

Question 1: "Does exposure cause disease?" (etiology)

Question 2: "Will this treatment reduce events over time?" (therapeutic)

Question 3: "How much will this patient benefit?" (individualized counseling)

Question 4: "Is this risk factor associated with outcome?" (screening/prognostic)

Frame for shared decision-making:

When relative ≠ absolute matters most:

Step 3 management: for any therapeutic decision, present both relative and absolute numbers — this is now a quality measure in shared decision-making documentation and is increasingly tested.

Key distinction: "Relative risk reduction" (RRR = 1 − RR) sells; "absolute risk reduction" informs. Pharma marketing exploits the gap. On Step 3, the right answer favors absolute terms for patient counseling.

Anchor every decision in baseline risk, not just the ratio.

"Pharmacotherapy" — The Toolkit of Effect Measures and Their Indications

Treat each measure as a "drug" with specific indications, contraindications, and side effects.

— Indication: RCTs, prospective cohorts with complete follow-up, cumulative incidence comparisons.

— Advantage: intuitive ("twice the risk"), directly translatable to ARR/NNT.

— Contraindication: case-control studies, variable follow-up, censoring.

— Side effect: can mislead when baseline risk varies wildly across populations.

— Indication: case-control studies, logistic regression for binary outcomes, meta-analyses combining mixed designs.

— Advantage: mathematically tractable, symmetric (OR of disease = 1/OR of no disease), works when outcome sampling is fixed.

— Contraindication: interpreting as RR when outcome is common (>10%).

— Side effect: systematic overestimation of effect when outcome is frequent → patients hear "double the odds" and think "double the risk."

— Indication: time-to-event analyses with censoring, varying follow-up, survival outcomes.

— Advantage: uses all available follow-up time, handles dropouts via censoring.

— Contraindication: non-proportional hazards (curves cross), competing risks ignored.

— Side effect: patients and clinicians often interpret HR as if it were RR; over short time horizons they roughly agree, over long horizons they can diverge.

— Incidence rate ratio (IRR): Poisson regression, count outcomes per person-time.

— Subdistribution hazard ratio (Fine-Gray): competing risks (e.g., cardiac death in cancer patients).

— Restricted mean survival time (RMST) ratio: when PH violated.

Risk Ratio (RR / Relative Risk):

Odds Ratio (OR):

Hazard Ratio (HR):

Special agents:

Step 3 management: in a journal club or M&M conference, first ask which measure was used and whether it matched the design before debating clinical implications. This is the single most common error in trainee critical appraisal.

Board pearl: ORs from logistic regression should be reported with predicted probabilities or marginal effects when outcomes are common — increasingly seen in modern epi papers.

Advanced Application — Meta-analysis, NNT, and Decision Thresholds

— RR and HR pool naturally on the log scale; OR is the default for fixed-effect Mantel-Haenszel.

— Heterogeneity (I²): >50% suggests caution; consider random-effects model and explore subgroups.

— Forest plots: each study's CI; diamond = pooled estimate. Funnel plot asymmetry → publication bias.

— From RCT: NNT = 1/ARR. Round up (NNT 24.3 → 25).

— From HR over time t: ARR ≈ baseline risk × (1 − HR^t/follow-up) — approximation.

— From OR with rare outcome: NNT ≈ 1/[baseline risk × (1 − OR)].

— Primary prevention: NNT often 50–200 acceptable for low-risk, low-cost interventions (statins, BP control).

— Secondary prevention: NNT 10–30 typical (DAPT post-ACS, β-blocker post-MI).

— High-cost or high-toxicity therapy: NNT <20 usually required (e.g., novel oncologics, biologics) plus quality-of-life consideration.

— Pretest probability + likelihood ratio → posttest probability. Effect measures inform prior beliefs about therapeutic benefit.

— Computing NNT from RRR alone without baseline risk → impossible; always need absolute numbers.

— Confusing incidence (new cases / population at risk) with prevalence (existing cases / total population).

Meta-analytic pooling:

NNT/NNH calculation in practice:

Decision thresholds for therapy:

Number needed to harm (NNH): parallel concept; always compare NNT to NNH for net benefit. SSRIs in adolescents: NNT for depression response ~5, NNH for suicidality ~150 → favorable net benefit.

Bayesian framing (increasingly tested):

Common Step 3 distractor calculations:

CCS pearl: when ordering a therapy on CCS, the implicit reasoning the case rewards is "benefit > harm at this patient's risk level" — internalize NNT/NNH framing so you don't order low-yield interventions (e.g., aspirin for primary prevention in low-risk 40-year-old, where harm exceeds benefit per 2022 USPSTF).

Special Populations — Elderly and Renal/Hepatic Considerations in Effect Interpretation

— Baseline risk is higher, so the same RR or HR yields a much larger ARR and smaller NNT — therapies that look marginal in young trial populations may be highly worthwhile in older patients (e.g., statins for primary prevention in 70-year-olds with elevated risk).

— Conversely, competing risks (death from non-target cause) reduce realized benefit; HR may overstate true patient-perceived benefit.

— Many landmark trials exclude patients >75 or with comorbidities — external validity is limited. Look for prespecified elderly subgroup estimates.

— Trials often exclude CKD stage 4–5 (eGFR <30). Effect estimates do not extrapolate.

— Competing risk of dialysis or renal death attenuates time-to-event endpoints — Fine-Gray subdistribution HR is more appropriate than Cox.

— Affects pharmacokinetics, not directly the statistical measure, but trial exclusions mean reported HRs/RRs don't apply to Child-Pugh B/C patients.

— A frail 85-year-old's time horizon to benefit may be shorter than the trial follow-up. If a statin's HR for CV death plays out over 5 years and life expectancy is 3 years, the patient may never realize the benefit.

— Use "time to benefit" literature: e.g., statins ~2.5 years, tight glycemic control ~8 years, mammography ~10 years.

— Reported HRs are usually averages; in patients on 10+ medications, drug-drug interactions can shift the realized HR substantially.

Elderly patients (age ≥65):

Renal impairment:

Hepatic impairment:

Multimorbidity and frailty:

Polypharmacy interaction:

Step 3 management: when applying an HR to an elderly or CKD patient, ask (1) was this population represented in the trial? (2) is life expectancy longer than time to benefit? (3) does competing risk attenuate the absolute gain? If any answer is no, downweight the therapy.

Board pearl: time to benefit > life expectancy = don't start — particularly relevant for screening colonoscopy, mammography, intensive glycemic control, and primary prevention statins in the very elderly.

Special Populations — Pregnancy, Pediatrics, and Underrepresented Groups

— Almost universally excluded from RCTs → effect estimates derive from observational cohorts and registries, typically reported as adjusted OR or adjusted RR.

— Confounding by indication is severe (sick patients get treated). Example: SSRIs in pregnancy and persistent pulmonary hypertension of newborn — early ORs ~6, later registry-based aHRs ~1.5 after better confounder control.

— Step 3 framing: report ORs from pharmacoepidemiology cautiously; emphasize absolute baseline risk (often <1%) when counseling.

— Small sample sizes → wide CIs, less precise HRs.

— Outcomes are often rare → OR ≈ RR is usually valid.

— Off-label use common; extrapolating adult HRs to children assumes shared pathophysiology — frequently wrong (e.g., antidepressants and suicidality risk inverted in adolescents).

— Subgroup HRs are usually underpowered and exploratory unless prespecified with interaction testing.

— Recent guidelines (ASCVD, eGFR) have removed race coefficients because they reflected historical confounding, not biology.

— Always check whether the trial population reflects your patient demographically — external validity, not just internal validity.

— Women historically underrepresented in cardiovascular trials → HRs for therapies like ICDs in HFrEF may not apply equally.

— FDA now requires sex-stratified subgroup reporting; look for interaction p-values.

— Baseline risks differ markedly; relative measures may transport, absolute benefit (ARR/NNT) does not.

Pregnancy:

Pediatrics:

Racial and ethnic subgroups:

Sex differences:

Low- and middle-income settings:

Key distinction: internal validity (was the study designed and analyzed correctly?) vs external validity / generalizability (does it apply to your patient?). Step 3 vignettes routinely give you a perfectly valid trial in a population that does not match the patient — the right answer often involves recognizing this gap.

Step 3 management: when the trial population differs meaningfully (age, sex, race, comorbidity) from your patient, weight the evidence accordingly and document shared decision-making under uncertainty.

Complications — Misinterpretation and Its Clinical Consequences

— "50% reduction in stroke" sounds compelling. If baseline 5-year risk is 1%, ARR is 0.5%, NNT 200 — the relative number drives prescribing far beyond what absolute benefit warrants.

— Consequence: polypharmacy, adverse events, cost burden without commensurate benefit.

— Postoperative delirium in elderly hip fracture patients: outcome ~30%, OR 2.5 for sedative exposure. Clinician hears "2.5× the risk" → actual RR ~1.8. Counseling and policy decisions inflated.

— A constant HR of 0.7 over 10 years does not mean 30% fewer events at year 10 — it means 30% lower instantaneous hazard at any time. Cumulative ARR depends on baseline cumulative incidence.

— Crossing KM curves: an early benefit may reverse later (e.g., some chemotherapies). Reporting a single HR masks this.

— Cox HR for stroke in elderly anticoagulated AF patients overestimates absolute stroke reduction when many patients die of other causes first. Use Fine-Gray subdistribution HR or report cumulative incidence functions.

— In observational data, sicker patients get more aggressive therapy → apparent harm. Spurious HR >1 for the therapy. Propensity score methods help but don't fully resolve.

— Counting follow-up time before treatment exposure as "exposed" — creates artifactual protective HR (<1). Common in hospital pharmacoepi.

— Testing 20 subgroups at α=0.05 → expect 1 false positive. Subgroup HRs without interaction testing are unreliable.

— Misread effect measure → inappropriate prescribing → adverse event → litigation risk.

Reporting RRR without ARR → overtreatment:

OR misinterpreted as RR (common outcome):

HR misinterpreted across time:

Ignoring competing risks:

Confounding by indication:

Immortal time bias:

Multiplicity in subgroups:

Patient harm pathway:

Board pearl: the most clinically dangerous error is treating an OR for a common outcome as if it were an RR, then quoting the inflated number to a patient — both legally and ethically problematic in informed-consent contexts.

When to Escalate — Consulting Biostatistics, Methodologic Review, and Stop-Decisions

— Designing a QI project, residency research, or pre-protocol review — pick the right measure before collecting data.

— Encountering non-proportional hazards, competing risks, or recurrent events.

— Performing or interpreting meta-analysis with heterogeneity or suspected publication bias.

— When a guideline recommendation rests on a single trial with borderline CI (e.g., HR 0.85, CI 0.72–1.00) — discuss whether to adopt locally.

— When subgroup HRs are being used to justify expanded indications without prespecified interaction tests.

— Stopping for benefit: prespecified efficacy boundary (O'Brien-Fleming, Haybittle-Peto) crossed at interim analysis. Risks overestimating effect size because trials that stop early tend to be on a favorable random walk.

— Stopping for futility: conditional power low; spares patients further exposure.

— Stopping for harm: safety boundary crossed; ethical obligation to halt.

— Patient outside trial inclusion (age, comorbidity, severity).

— Outcome differs from trial endpoint (composite endpoint vs the one you care about).

— Time horizon mismatch.

— Local population baseline risk substantially different.

— Payers increasingly demand cost-effectiveness analyses built on HRs and NNTs. Understanding these measures is now part of value-based care competency.

Escalate to formal statistical consultation when:

Escalate to journal club or M&M:

Stop-decisions in trials (DSMB territory, increasingly tested):

Clinical application: therapies approved on trials stopped early for benefit often show smaller real-world effects (e.g., several oncologic agents).

When to not apply a published HR to your patient:

Health-system context:

Step 3 management: when uncertain whether to apply a trial result, document the gap (patient differs from trial population in X way) and engage in explicit shared decision-making — both protects the patient and creates a defensible record.

CCS pearl: ordering "biostatistics consult" isn't a CCS option, but "review of evidence" and "patient education / shared decision-making" are valid and rewarded actions when the evidence base is borderline.

Key Differentials — Other Effect Measures in the Same Family

— Subtractive, not ratio. RD = risk_exposed − risk_unexposed.

— Best for clinical decision-making; basis for NNT.

— Limitation: depends heavily on baseline risk, doesn't transport across populations as well as ratio measures.

— Ratio of incidence rates (events per person-time).

— Used in Poisson or negative binomial regression; appropriate for recurrent events (e.g., COPD exacerbations per patient-year).

— Distinct from RR (cumulative incidence) and HR (instantaneous hazard).

— Observed deaths / expected deaths in a reference population. Used in occupational and registry epi.

— Rate ratio assumes constant hazard; HR allows time-varying baseline hazard but assumes proportionality between groups.

— Accounts for competing risks; gives cumulative incidence interpretation directly.

— Conditional OR/HR: effect within strata of covariates (regression output).

— Marginal OR/HR/RR: population-averaged effect (standardization, IPTW).

— For binary outcomes with common events, conditional and marginal ORs differ — important for policy decisions.

— Average event-free time over a defined horizon. Robust when PH is violated. Increasingly reported in oncology.

Don't confuse the three core measures with their relatives:

Risk difference (RD) / absolute risk reduction (ARR):

Incidence rate ratio (IRR):

Standardized mortality ratio (SMR):

Rate ratio vs hazard ratio:

Subdistribution HR (Fine-Gray):

Marginal vs conditional effects:

Restricted mean survival time (RMST):

Key distinction: HR ≠ rate ratio ≠ IRR ≠ RR, though all may be loosely called "relative risk" in news media. On Step 3, the precise term in the stem tells you the exact analysis used.

Board pearl: if a paper reports both an HR and a subdistribution HR and they differ substantially, competing risks materially affect interpretation — quote the subdistribution HR for absolute-risk patient counseling.

Key Differentials — Confounding, Bias, and Other Threats to Causal Inference

— Third variable associated with both exposure and outcome, distorting the measured effect.

— Addressed by randomization (RCT), restriction, matching, stratification, multivariable adjustment, propensity scores, instrumental variables.

— Residual unmeasured confounding always possible in observational studies.

— Berkson's bias in hospital-based case-control.

— Loss to follow-up differential between exposure groups → biased HR.

— Healthy worker effect.

— Recall bias: cases remember exposures differently than controls — inflates OR in case-control.

— Misclassification: non-differential biases toward null; differential can bias either direction.

— Cross-sectional and case-control particularly vulnerable. Example: low cholesterol "causing" cancer when subclinical cancer actually lowers cholesterol.

— Inferring individual-level associations from group-level data.

— RCT > prospective cohort > retrospective cohort > case-control > cross-sectional > case series.

— Mendelian randomization and target trial emulation are modern bridges.

An effect measure is only as good as the design that generated it.

Confounding:

Selection bias:

Information / measurement bias:

Reverse causation:

Immortal time bias (already mentioned, common in pharmacoepi).

Lead-time and length-time bias: screening studies — apparent survival benefit without true mortality reduction.

Ecological fallacy:

Hierarchy of evidence for causal effect estimation:

Bradford Hill criteria for causation (strength, consistency, temporality, biological gradient, plausibility, coherence, experiment, analogy, specificity) — useful when only observational data exist.

Step 3 management: when a vignette presents a striking observational HR (e.g., HRT and CHD prevention, pre-WHI), the right answer is often to wait for or favor the RCT before changing practice. Observational ORs/HRs are hypothesis-generating; RCTs are confirmatory.

Board pearl: "confounding by indication" is the single most-tested observational bias on Step 3 — sicker patients get treated, making treatments look harmful in unadjusted analyses.

Secondary Prevention — Building Long-Term Practice Habits with Effect Measures

— Always ask: what's the patient's baseline risk? What's the relative effect (RR/HR)? What's the ARR and NNT? What's the NNH?

— Translate ratios into natural frequencies for patient counseling: "out of 100 people like you over 5 years…"

— Record the shared decision-making conversation including both relative and absolute risk discussion.

— Note where the patient's profile aligns or diverges from the trial population cited.

— Subscribe to evidence-summary services (NEJM Journal Watch, ACP Journal Club) that consistently present ARR/NNT alongside HRs.

— Use clinical decision tools that auto-calculate absolute benefit (e.g., MDCalc's ASCVD risk + statin benefit estimator).

— When monitoring local outcomes (e.g., readmission rates), use risk-adjusted observed/expected ratios — analogous to SMR.

— When implementing a new protocol, prespecify the effect measure and threshold for adoption before reviewing data.

— Don't quote a pharma rep's RRR figure without computing ARR for your patient.

— Don't apply a trial HR to a patient whose baseline risk is 5× higher or lower than the trial mean without recalculating expected absolute benefit.

— Don't use OR from a common-outcome logistic regression as if it were RR.

— Refresh biostatistics annually — at minimum review one critical appraisal per month with focus on effect measure interpretation.

Habits to adopt for every therapeutic decision:

Documentation for medicolegal protection:

Maintaining literacy long-term:

Quality improvement applications:

Pitfalls to avoid:

Continuing education priority:

Step 3 management: the long-term clinical "secondary prevention" here is professional discipline — every patient encounter where you cite evidence should involve a deliberate translation from the published ratio to the patient's absolute risk landscape.

Board pearl: the highest-yield habit is always computing NNT before initiating chronic preventive therapy — particularly relevant for statins, antihypertensives, aspirin, and bisphosphonates.

Follow-Up, Monitoring Parameters, and Patient Counseling

— Periodic chart review: are shared decision-making notes citing absolute risk?

— Are you re-evaluating chronic preventive therapies as patient risk profile changes (e.g., aspirin in primary prevention as patient ages)?

— Guidelines evolve (e.g., aspirin primary prevention 2022 USPSTF reversal for older adults; tight glycemic control after ACCORD).

— Re-counsel patients when foundational evidence changes; document the update.

— Use pictographs (100-person icon arrays) for absolute risk — proven to improve patient comprehension over ratios alone.

— Avoid framing effects: present gains and losses symmetrically ("90 of 100 won't have an event without treatment, 93 of 100 won't with treatment" vs "3% absolute reduction").

— Address numeracy explicitly; many patients struggle with percentages.

— For lifestyle interventions, effect sizes are often small (HR 0.85–0.95) but cumulative and synergistic with pharmacotherapy.

— Track adherence; trial HRs assume per-protocol adherence rarely seen in practice.

— Long-term follow-up of RCTs sometimes shows divergence from initial HRs (e.g., delayed cardiovascular benefit, delayed harm).

— Post-marketing surveillance (FAERS, registry data) provides real-world effect estimates that may differ from premarketing trials.

— Cardiac rehab post-MI: HR for mortality ~0.74; baseline 10-yr mortality ~25% → ARR ~6%, NNT ~17 — strongly worthwhile, emphasize this.

— Diabetes prevention program: lifestyle vs metformin RR for incident DM ~0.42 vs 0.69 over 3 years — lifestyle wins on relative and absolute.

Monitoring effect measure literacy in your practice:

When effect estimates change — be willing to update:

Patient counseling techniques:

Rehab / behavioral monitoring:

Surveillance for new evidence:

Counseling for risk-modification programs:

CCS pearl: in longitudinal CCS cases, scheduled follow-up visits should include explicit re-counseling on absolute risk and NNT as the patient ages or develops new comorbidities — this is rewarded as comprehensive care.

Ethical, Legal, and Patient Safety Considerations

— Legally and ethically, informed consent requires disclosure of material risks and benefits. Quoting only RRR ("50% reduction") without ARR/NNT may meet the letter of consent but violates its spirit and increasingly its legal standard in some jurisdictions.

— Concrete example: patient signs consent for statin therapy after being told it "halves heart attack risk" — actual ARR over 5 years for low-risk primary prevention is ~1%. If patient later experiences adverse effect (rhabdomyolysis, new diabetes) and learns of inflated framing, both trust and legal defensibility erode.

— Discharge summaries should specify the absolute benefit expected from each new chronic medication to support continuity decisions by the receiving clinician.

— Failure to translate evidence into patient-specific terms is a documented patient-safety gap.

— Public health agencies use SMRs and rate ratios for surveillance; clinicians contributing data have obligations to accurate reporting (e.g., cancer registries, communicable disease).

— Equipoise: an RCT is ethical when the effect (HR/RR) is genuinely uncertain. Continued randomization after a DSMB-confirmed efficacy boundary crossing violates equipoise.

— Stopping rules protect participants; understanding why a trial stopped early matters for interpretation and consent in subsequent trials.

— Industry-sponsored trials more often report favorable RRRs prominently. Recognize and counterbalance with independent appraisal.

— Applying HRs derived from non-representative populations to underrepresented patients can perpetuate disparities. Document the gap.

— A 78-year-old on six chronic preventive medications, each justified by HRs from trials excluding patients >75. Cumulative anticholinergic burden, hypotension, falls. Deprescribing review is an ethical obligation grounded in honest effect-measure reassessment for the individual patient.

Informed consent and effect measures:

Transitions of care:

Mandatory reporting and population effects:

Research ethics:

Conflicts of interest:

Health equity:

Patient safety case example:

Board pearl: shared decision-making documented with both relative and absolute risk is the current standard of care and is the highest-yield ethics answer on Step 3 for any evidence-based therapy decision.

High-Yield Associations and Rapid-Fire Clinical Facts

RR = 1, OR = 1, HR = 1 → no effect. CI crossing 1 → not statistically significant.

RR < 1 / HR < 1 → protective (treatment reduces outcome).

OR ≈ RR when outcome is rare (<10%); OR overstates RR when outcome is common.

Case-control study → only OR is valid; RR cannot be calculated.

Logistic regression → output is always OR, regardless of design.

Cox regression / Kaplan-Meier → output is HR.

Poisson regression → output is IRR (rate ratio).

NNT = 1/ARR, round up. NNH = 1/ARI.

RRR = 1 − RR, but never quote RRR alone.

Confidence interval excludes 1 ↔ p < 0.05 for ratio measures.

Wide CI → imprecise, often small sample or rare outcome.

Heterogeneity I² > 50% in meta-analysis → caution, explore.

Stopped-early trials tend to overestimate effect size.

Subgroup analyses unreliable unless prespecified with interaction p-value.

Competing risks → use Fine-Gray subdistribution HR.

Non-proportional hazards (curves cross) → HR is averaged distortion; use RMST or time-stratified HR.

Confounding by indication → sicker patients get treated → spurious harmful HR.

Immortal time bias → spurious protective HR.

Recall bias → inflates OR in case-control studies.

Aspirin primary prevention in older adults: RR for CV events modest, NNH for bleeding low → 2022 USPSTF removed routine recommendation for ≥60.

Statin primary prevention (10-yr ASCVD ≥7.5%): NNT ~50–100 over 10 years.

Cardiac rehab post-MI: NNT ~17 for mortality reduction.

Anticoagulation for AF (CHA₂DS₂-VASc ≥2): NNT ~25–35 per year for stroke prevention.

Time to benefit > life expectancy → don't start.

Always translate ratios to absolute terms for patient counseling — this is the single most board-tested principle.

Board pearl: if a Step 3 stem gives both a 2×2 table and a study design, expect to compute and choose between RR and OR — the design dictates which is appropriate.

Board Question Stem Patterns

— Stem: case-control study of NSAIDs and GI bleed, 2×2 table provided. Question: "What is the most appropriate measure of association?" Answer: odds ratio (RR is invalid because outcome-defined sampling).

— Stem provides cohort data; asks to compute RR or NNT. Plug into formulas: RR = (a/(a+b)) / (c/(c+d)); NNT = 1/ARR.

— Stem reports OR 2.5 for postoperative delirium (30% outcome). Question: "What is the most important caveat in counseling the patient?" Answer: OR overestimates RR when outcome is common.

— HR 0.78 (95% CI 0.65–0.93). Question: significance and clinical meaning. Answer: statistically significant (CI excludes 1), 22% relative hazard reduction, must compute ARR using baseline risk for patient counseling.

— Stem reports unfavorable overall HR but favorable subgroup HR. Question: "Should you offer therapy to this subgroup patient?" Answer: usually no without prespecified significant interaction.

— Patient is 82 with CKD; trial enrolled patients 50–70 with normal renal function. Question on applying the HR. Answer: recognize limited generalizability, engage in shared decision-making.

— Observational study shows therapy associated with increased mortality (HR 1.4). Question: most likely explanation. Answer: confounding by indication (sicker patients got therapy); recommend RCT before changing practice.

— Frail 88-year-old with 2-year life expectancy; statin benefit emerges at 2.5 years. Question: appropriate management. Answer: do not initiate (time to benefit exceeds life expectancy).

— Trial stopped early for benefit, HR 0.45. Question: caveat. Answer: likely overestimates effect size; await confirmatory data.

— Patient asks "how much will this help me?" Best response converts RR/HR into absolute terms using natural frequencies.

Pattern 1 — Design-driven measure choice:

Pattern 2 — 2×2 calculation:

Pattern 3 — OR vs RR misinterpretation:

Pattern 4 — CI interpretation:

Pattern 5 — Subgroup trap:

Pattern 6 — External validity:

Pattern 7 — Confounding by indication:

Pattern 8 — Time-to-benefit:

Pattern 9 — Stopped-early trial:

Pattern 10 — Shared decision-making framing:

Board pearl: when in doubt, the answer favors absolute risk + shared decision-making.

One-Line Recap

Choose the effect measure by study design — RR for cohorts/RCTs, OR for case-control or logistic regression, HR for time-to-event analyses — then always translate the ratio into absolute risk and NNT for the individual patient sitting in front of you.

— Cohort/RCT with fixed follow-up → risk ratio + ARR + NNT.

— Case-control or any logistic regression → odds ratio (never call it RR if outcome is common).

— Time-to-event with censoring (Cox model, Kaplan-Meier) → hazard ratio, check proportional hazards and competing risks.

— Point estimate alone is meaningless; pair with 95% CI, baseline risk, and absolute translation.

— CI excludes 1 → statistically significant; CI tightness reflects precision.

— OR ≈ RR only when outcome < 10%; otherwise OR exaggerates.

— Convert every relative effect into natural frequencies ("3 fewer events per 100 over 5 years").

— Weigh against NNH, life expectancy, and time to benefit before initiating chronic therapy.

— Document shared decision-making explicitly.

— Confounding by indication, immortal time bias, subgroup multiplicity, stopped-early trial overestimation, external validity gaps in elderly/CKD/pregnancy/pediatric patients.

Design dictates measure:

Interpretation discipline:

Patient-centered translation:

Watch the traps:

Step 3 mantra: design → measure → CI → absolute risk → patient. Skipping any step is how guideline-concordant care becomes guideline-misapplied care, and how board-correct reasoning becomes a wrong answer.

Board pearl: the single highest-yield habit — and the most-tested cognitive move on Step 3 biostatistics vignettes — is converting any reported ratio into an NNT using the patient's own baseline risk before recommending therapy. Master this and you master the topic.