Biostatistics & Population Health

Causation criteria: Bradford Hill

Clinical Overview and When to Suspect Causation vs Association

— Originally articulated in his presidential address to the Royal Society of Medicine on the environment–disease relationship (smoking and lung cancer was the motivating case)

— Not a checklist, not a scoring system, not a statistical test — they are heuristics for inference applied after an association has been demonstrated

— Interpreting an epidemiologic study (cohort, case-control) that reports an exposure–disease link

— Public health / occupational medicine vignettes (lead, asbestos, vaping, opioid prescribing patterns)

— Pharmacovigilance signals (drug A → adverse event B)

— Distinguishing correlation from causation when bias, confounding, or chance could explain findings

— A valid statistical association must first be established

— Chance ruled out (p-value, confidence interval)

— Bias addressed (selection, information, recall)

— Confounding addressed (stratification, multivariable regression, matching)

— Only then is it meaningful to ask: is this causal?

— Mnemonic: "Strong Coffee Should Taste Better, Please Consider Extra Aroma"

Bradford Hill criteria (Sir Austin Bradford Hill, 1965) are a framework of nine viewpoints used to judge whether an observed statistical association between an exposure and an outcome reflects a causal relationship

When the framework is invoked on Step 3

Prerequisites before applying Hill's viewpoints

The nine criteria (memorize): Strength, Consistency, Specificity, Temporality, Biological gradient (dose–response), Plausibility, Coherence, Experiment, Analogy

Board pearl: Only temporality (exposure must precede outcome) is considered absolutely required for causation. All others strengthen the argument but none alone is necessary or sufficient. A study can demonstrate causation without satisfying every criterion — and satisfying many criteria does not prove causation, only supports the inference.

Presentation Patterns and Key History — How Hill Criteria Appear in Vignettes

— A study is described (often cohort or case-control), an RR/OR is given, and the examinee is asked which Hill criterion is best illustrated, violated, or missing

— Alternatively, a public health official or clinician must decide whether an exposure–outcome link is "causal enough" to act on

— Strength: "Smokers had a 20-fold increase in lung cancer" → large RR/OR favors causation; small effects more vulnerable to residual confounding

— Consistency: "Findings replicated across multiple populations, study designs, and investigators" → reproducibility

— Specificity: "Exposure causes one specific disease in a specific population" → weakest criterion (most exposures cause multiple outcomes; smoking causes many cancers)

— Temporality: "Exposure clearly preceded outcome onset" → the only mandatory criterion

— Biological gradient: "Higher dose or longer duration → higher risk" (dose–response)

— Plausibility: Fits known biological mechanism

— Coherence: Does not conflict with known facts about the disease's natural history and biology

— Experiment: Removing exposure reduces incidence (smoking cessation programs ↓ lung cancer); or RCT evidence

— Analogy: Similar exposures cause similar effects (thalidomide → suspicion of other teratogens)

— Duration and intensity of exposure → biological gradient

— Sequence of exposure and symptom onset → temporality

— Family/community clustering with same exposure → consistency

Typical Step 3 stem structure

Recognizing each criterion in prose

History elements that map to criteria

Key distinction: Specificity vs Consistency — Specificity asks "does this exposure cause one disease?" (often violated and acceptably so). Consistency asks "do multiple studies show the same association?" Students confuse these constantly on exam day.

Physical Exam Findings — Translating to "Exam" of the Evidence Base

— RR ≥ 2–3 generally considered "strong"; RR 1.1–1.5 considered weak and easily explained by unmeasured confounding

— Hill's smoking–lung cancer example: RR ~9–20 depending on intensity

— Weak associations are not non-causal — they simply require more rigorous control of bias

— Check for meta-analyses, systematic reviews, multi-country cohorts

— Heterogeneity (I²) helps quantify inconsistency

— Watch for publication bias — funnel plot asymmetry weakens the consistency argument

— Graded exposure (pack-years, mSv, mg/day) should yield graded risk

— Threshold effects and non-monotonic curves (J-shape for alcohol/CV mortality) complicate but don't refute causation

— Absence of dose–response is a red flag, especially for chemical or radiation exposures

— In cross-sectional studies, temporality is often unknowable → causal inference is severely limited

— Cohort studies establish temporality best; case-control studies are vulnerable to recall and reverse causation

— Plausibility = fits known biology/mechanism

— Coherence = fits known epidemiology and natural history of disease

— Both are limited by current scientific knowledge — early in a discovery, plausibility may be low yet the association still causal (e.g., H. pylori and PUD initially dismissed)

Since Bradford Hill is an epidemiologic framework, the "physical exam" analog is the critical appraisal of an association — the structured inspection of study features that either strengthen or weaken causal inference

Inspection of effect size (Strength)

Palpation of replication (Consistency)

Auscultation of dose–response (Biological gradient)

Percussion of temporality

Vitals of plausibility & coherence

Board pearl: A large, consistent, dose–responsive association with clear temporality is the strongest combination. Examiners reward recognizing all four together as a near-definitive causal signal.

Diagnostic Workup — Establishing the Statistical Association First

— Compute measure of effect appropriate to study design:

— Cohort study → relative risk (RR) = incidence in exposed / incidence in unexposed

— Case-control study → odds ratio (OR) as estimate of RR (valid when disease is rare, <10%)

— Cross-sectional study → prevalence ratio or prevalence OR

— RCT → RR or absolute risk reduction (ARR), NNT = 1/ARR

— p-value < 0.05 conventionally

— 95% CI for RR/OR — if it crosses 1.0, association is not statistically significant

— Wide CI = imprecise (small sample, low power)

— Type I error (α) = false positive; Type II (β) = false negative; power = 1 − β, typically ≥ 0.80

— Selection bias (Berkson, healthy worker effect, non-response)

— Information bias (recall, interviewer, misclassification)

— Lead-time / length-time bias in screening studies

— A confounder is associated with both exposure and outcome, and is not on the causal pathway

— Address by: randomization (design), restriction, matching (design), stratification, multivariable regression, propensity scores (analysis)

— Effect modification (interaction) ≠ confounding — it should be reported, not adjusted away

Step 1: Confirm a real association exists before applying Hill at all

Step 2: Rule out chance

Step 3: Rule out bias

Step 4: Rule out confounding

Step 5: Only now apply Hill criteria to judge causation

Step 3 management: When a stem reports an OR with CI 0.9–2.5, the first move is not to invoke Hill — it is to recognize the association is not statistically significant (CI crosses 1) and therefore causal inference is premature. Examiners trap test-takers who jump straight to causation language.

Diagnostic Workup — Advanced Causal Inference Tools Beyond Hill

— Causation defined as: outcome in the exposed differs from what would have occurred in the same individuals had they been unexposed

— RCTs approximate this through randomization, which balances measured and unmeasured confounders

— Visual tool to identify confounders, mediators, and colliders

— Mediator: on causal pathway (don't adjust if estimating total effect)

— Collider: common effect of two variables — adjusting for a collider induces bias (collider-stratification bias)

— Uses genetic variants as instrumental variables (random at meiosis → mimics randomization)

— Strengthens causal inference for exposures like LDL, BMI, alcohol

— Systematic review/meta-analysis of RCTs > single RCT > cohort > case-control > cross-sectional > case series > expert opinion

— RCT is the gold standard because randomization handles unmeasured confounding

— Specificity often inappropriate (smoking causes many cancers; one cause → many outcomes is the norm)

— Plausibility is knowledge-dependent and can be circular

— No weighting between criteria

Modern epidemiology supplements Hill with formal causal frameworks — increasingly tested on Step 3 biostatistics blocks

Counterfactual framework

Directed acyclic graphs (DAGs)

Mendelian randomization

Hierarchy of evidence (causal strength)

Bradford Hill's "Experiment" criterion maps directly to this hierarchy — if removing the exposure (cessation campaign, regulation, randomized intervention) reduces disease incidence, causation is strongly supported

Limitations of Hill in modern context

Board pearl: Examiners increasingly test the concept that randomized trials are not always feasible or ethical (you cannot randomize people to smoke). In such cases, Hill's framework + DAG-informed observational analysis + Mendelian randomization together provide the best available causal evidence — and public health action proceeds without an RCT (e.g., the 1964 Surgeon General's report).

Risk Stratification — Weighting the Nine Criteria in Practice

— Temporality: Exposure must precede outcome. Without it, no causal claim is possible. Reverse causation (disease causes the "exposure") is the classic alternative explanation

— Example violation: low cholesterol "causes" cancer — but occult cancer may lower cholesterol (reverse causation)

— Strength: Large RR/OR harder to explain by residual confounding

— Biological gradient: Dose–response is mechanistically reassuring

— Consistency: Replication across populations, designs, investigators

— Experiment: Intervention reverses outcome (cessation, regulation, RCT)

— Plausibility: Depends on current biological knowledge

— Coherence: Compatible with natural history of disease

— Analogy: Similar exposures cause similar effects

— Specificity: Most exposures cause multiple outcomes; most outcomes have multiple causes — violating specificity is the norm, not evidence against causation

— Identify which criteria the study demonstrates, which it fails, and which are unaddressable by the study design

— Cross-sectional studies cannot demonstrate temporality

— Single-study results cannot demonstrate consistency

— Animal models contribute plausibility but not human consistency

— Strong, consistent, dose–responsive association with temporality and plausibility → act, even without RCT (smoking, asbestos, lead)

— Weak, inconsistent association → demand more evidence before policy/clinical change

Tier 1 — Essentially required

Tier 2 — Strongly supportive when present

Tier 3 — Supportive but limited

Tier 4 — Often violated, low weight

Applying to a vignette

Decision threshold for public health action

Key distinction: Necessary cause (without it, disease cannot occur — e.g., HIV for AIDS) vs sufficient cause (alone produces disease) vs component cause (one of several factors required together — Rothman's causal pies). Hill's criteria address probabilistic causation in populations, not deterministic individual-level causation.

Pharmacotherapy Analog — Applying Hill to Drug-Adverse Event Signals

— Temporal relationship of event to drug administration

— Improvement on dechallenge (drug stopped → event resolves) → supports causation = Hill's "Experiment"

— Recurrence on rechallenge → strong causal evidence (rarely ethical)

— Dose–response → biological gradient

— Plausibility from pharmacology

— Alternative explanations excluded (confounding by indication)

— Patients prescribed drug X differ systematically from those not prescribed

— Example: antidepressants and suicide — depression itself is the confounder

— Addressed by active comparator design, propensity score matching, instrumental variables

— Sicker patients channeled to newer/safer-perceived drugs

— Can create spurious associations in either direction

— SSRI use and GI bleeding

— Temporality (use precedes bleed) ✓

— Strength (OR ~2) — moderate

— Consistency (multiple cohorts) ✓

— Dose–response (higher serotonin affinity → higher risk) ✓

— Plausibility (platelets depend on serotonin uptake) ✓

— Experiment (discontinuation reduces risk) ✓

— Conclusion: causal — labeling and clinical guidance reflect this

Pharmacovigilance is a major real-world application: deciding whether a drug actually causes an adverse event reported in spontaneous reports (FAERS, VAERS), case series, or observational studies

Naranjo Adverse Drug Reaction Probability Scale — a clinical-level Hill analog

Confounding by indication — the major bias in drug–outcome observational studies

Channeling bias

Hill applied to a drug safety signal — worked example

Step 3 management: When a vignette describes a temporal association between a newly started drug and an adverse event, the immediate step is dechallenge (stop the drug, observe). Rechallenge is generally not pursued for serious reactions (Stevens-Johnson, anaphylaxis, agranulocytosis) for safety reasons — recognizing this ethical boundary is itself testable.

Procedures / Study Designs — Matching Design to Causal Question

— Randomization balances measured and unmeasured confounders

— Blinding minimizes information bias

— Limits: cost, ethics (can't randomize to harm), generalizability (efficacy vs effectiveness), short follow-up

— Best for establishing Experiment criterion directly

— Exposure measured before outcome → satisfies temporality cleanly

— Calculates incidence and RR

— Vulnerable to loss to follow-up, confounding

— Best for common outcomes, rare exposures, long latency diseases (Framingham, Nurses' Health Study)

— Exposure and outcome both already occurred; investigator looks back through records

— Faster, cheaper; risk of incomplete data

— Best for rare outcomes (cancers, rare AEs)

— Selects on outcome → cannot calculate incidence, only OR

— Vulnerable to recall bias and selection bias

— Snapshot — measures prevalence, not incidence

— Cannot establish temporality → poor for causation

— Group-level data → risk of ecological fallacy (group-level association ≠ individual-level association)

— Useful for hypothesis generation only

— Interrupted time series, difference-in-differences, regression discontinuity — strengthen causal inference when RCT impossible (policy evaluation)

Hierarchy of study designs for causal inference (strongest → weakest)

Randomized controlled trial (RCT)

Prospective cohort

Retrospective cohort

Case-control

Cross-sectional

Ecological

Quasi-experimental designs

CCS pearl: When a Step 3 question asks "what is the best study design to determine if exposure X causes disease Y," the answer hierarchy is: RCT > prospective cohort > retrospective cohort > case-control > cross-sectional, modified by feasibility. For a rare disease, even though RCT is theoretically stronger, case-control is the practical answer because RCT/cohort would require impossibly large samples.

Special Populations — Subgroup Analyses, Effect Modification, and Generalizability

— A causal relationship demonstrated in one population may not transfer

— Hill's Consistency criterion partially addresses this — replication across diverse populations strengthens generalizability

— Magnitude of effect differs across subgroups (age, sex, genotype, comorbidity)

— Example: oral contraceptives + smoking → multiplicative VTE risk

— Effect modification should be reported and stratified, not adjusted away

— Contrasts with confounding, which should be controlled

— Often excluded from RCTs → causal inferences may not apply

— Competing risks (death from other causes) complicate outcome ascertainment

— Confounding by frailty/indication is severe

— Pharmacokinetic differences mean drug–outcome associations may have different dose–response curves in these groups

— Hill's biological gradient must be re-examined per subgroup

— Causation established in adults often extrapolated to children — but mechanisms (developmental biology) may differ

— FDA pediatric extrapolation framework formally evaluates whether adult causal evidence applies

— Almost universally excluded from RCTs → causation for pregnancy outcomes often relies on registries, case-control, and cohort studies

— Teratogenicity assessment leans heavily on Hill (thalidomide: strong, consistent, dose–responsive, temporal, plausible, analogous)

— Pharmacogenomics (CYP2C19 and clopidogrel, HLA-B*5701 and abacavir) → effect modification at genotype level

— Single-ancestry studies limit consistency claims

Generalizability (external validity) is a critical layer atop Hill criteria

Effect modification (interaction)

Elderly populations

Renal/hepatic impairment

Pediatric extrapolation

Pregnancy

Genetic/ancestry subgroups

Board pearl: When a study population is narrow (e.g., middle-aged white male veterans), examiners reward identifying that causal conclusions may apply only to similar populations. Generalizing without supporting evidence from diverse cohorts violates Hill's Consistency in spirit and is a common trap in question stems about applying trial results to your specific patient.

Special Populations — Public Health, Environmental, and Occupational Causation

— Smoking and lung cancer (Doll & Hill, 1950s): strong RR, dose–response with pack-years, consistent across countries, plausible (carcinogens in smoke), reversible with cessation (experiment), analogous to other tobacco cancers

— Asbestos and mesothelioma: near-specific exposure for a near-specific disease (rare instance where specificity holds)

— Lead and pediatric cognitive deficits: dose–response, consistency, plausibility, experiment (lead abatement → improved scores)

— Aspirin and Reye syndrome: temporal, strong, experiment (warning labels → ↓incidence)

— H. pylori and PUD/gastric cancer (Marshall): initially low plausibility but strong other criteria; experiment (eradication cures) clinched causation

— Workers are healthier than general population at baseline → biases occupational cohort studies toward the null

— Use internal comparisons (high-exposure vs low-exposure workers) rather than general-population SMRs

— Many environmental/occupational exposures have decades-long latency (asbestos → mesothelioma 20–40 yr)

— Studies must have adequate follow-up; short studies may falsely reject causation

— Certain causal links trigger public health reporting (occupational lead, communicable disease, suspected cluster of cancers)

— Clinicians have a role in surveillance

— When evidence is suggestive but not conclusive, public health may act to limit exposure (BPA, PFAS) — a policy decision that goes beyond strict Hill satisfaction

Occupational and environmental epidemiology are the historical home of Hill criteria — Hill himself worked in occupational medicine

Classic causation cases solved by Hill-style reasoning

Healthy worker effect

Latency

Mandatory reporting

Precautionary principle

Step 3 management: A vignette describing a cluster of cancers in a workplace warrants notification of OSHA, NIOSH, and state/local health departments — recognizing the system-level response is testable on Step 3 community medicine items.

Complications — Common Errors in Causal Inference

— Outcome causes the apparent exposure

— Example: physical inactivity "causes" obesity vs obesity causes inactivity

— Mitigated by: prospective design, lag analyses, Mendelian randomization

— Third variable associated with both exposure and outcome

— Classic: coffee and lung cancer (smoking confounds)

— Mitigated by: randomization, restriction, matching, stratification, regression, propensity scores

— Berkson's bias (hospitalized controls), non-response, loss to follow-up, healthy-worker

— Distorts the apparent association in unpredictable directions

— Differential: misclassification differs by group → biases toward or away from null unpredictably

— Non-differential: equal across groups → typically biases toward the null

— Cases remember exposures differently than controls (case-control studies)

— Mitigated by structured interviews, records-based exposure ascertainment

— Inferring individual-level causation from group-level data

— Famous example: per capita fat intake and breast cancer correlate at country level but not at individual level

— Opposite error: inferring population effects from individual-level data alone

— Pharmacoepidemiology error: misclassifying time before drug initiation, falsely favoring "treated" group survival

— Screening studies — apparent survival benefit from earlier detection, not actual mortality reduction

— Using criteria as a checklist with scoring (Hill explicitly warned against this)

— Demanding all 9 criteria be met before accepting causation

— Treating specificity as essential (it usually isn't)

Reverse causation

Confounding

Selection bias

Information bias / misclassification

Recall bias

Ecological fallacy

Atomistic fallacy

Immortal time bias

Lead-time and length-time bias

Hill criteria misapplication

Key distinction: Bias vs Confounding — bias is a systematic error in study design or measurement (can't be fixed in analysis); confounding is a true alternative explanation that can be statistically adjusted if measured. Step 3 questions love this distinction.

When to Escalate — From Association to Action and Policy

— Strong, consistent causal evidence + favorable risk-benefit + alternatives considered → change practice

— Example: COX-2 inhibitors and CV events (rofecoxib withdrawal 2004)

— Systematic reviews and meta-analyses synthesizing Hill-satisfied evidence → guideline committees (USPSTF, AHA/ACC, ADA) update recommendations

— GRADE framework formalizes evidence quality and recommendation strength

— FDA black-box warnings, drug withdrawals

— EPA exposure limits

— OSHA workplace standards

— Often requires lower causal certainty than clinical practice change because of population-scale stakes

— Strong temporal, consistent association during an outbreak may trigger action before full Hill satisfaction (precautionary principle)

— Example: Legionnaires' disease (1976) — cooling tower link acted on before full mechanistic confirmation

— Vaping-associated lung injury (EVALI, 2019) — vitamin E acetate identified rapidly using case-control and dechallenge logic

— Adverse event noticed → MedWatch/FAERS report

— Cluster noticed → notify public health department

— Patient harm from systems issue → root cause analysis, patient safety officer

— Epidemiologist / biostatistician for study design and analysis

— Public health authority for cluster investigation

— Risk management / legal for potentially preventable harms

Threshold for individual clinical action

Threshold for guideline change

Threshold for regulatory action

Threshold for public health emergency

Escalation pathway in the clinic

Consultation triggers

CCS pearl: On CCS-style vignettes involving suspected disease clusters or environmental exposures, advancing the clock to notify the local/state health department is virtually always a correct early action — alongside symptomatic management and exposure documentation. The Step 3 examiner expects integrated clinical + public health thinking.

Key Differentials — Same Category (Alternative Causal Frameworks)

— Each disease has multiple sufficient causes, each composed of component causes

— A component is necessary if it appears in every sufficient cause (e.g., HIV in AIDS)

— Useful for understanding multifactorial disease (CAD, cancer)

— Reframes "the cause" as a set of interacting components, not a single agent

— Causation = difference between observed outcome and counterfactual outcome had exposure differed

— Foundation of modern causal inference, RCTs, propensity scores

— Average treatment effect (ATE) and average treatment effect on the treated (ATT) are formal estimands

— Graphical representation of causal assumptions

— Identifies confounders to adjust, mediators to leave alone, colliders to avoid

— Formalizes when an observational study can estimate a causal effect (back-door criterion)

— Historical framework for infectious causation:

— Organism present in all cases

— Isolated and grown in pure culture

— Reproduces disease when inoculated

— Re-isolated from new host

— Limited in modern era (asymptomatic carriers, non-culturable organisms, viruses, multifactorial disease) → updated with molecular Koch's postulates (Falkow)

Bradford Hill (1965) is the most cited, but several alternative or complementary frameworks exist for causal inference — examiners may test recognition

Rothman's component cause model (causal pies, 1976)

Counterfactual / potential outcomes framework (Rubin, Neyman)

Structural causal models / DAGs (Pearl)

Koch's postulates (microbiology)

Henle-Koch postulates updated for viruses (Rivers, 1937)

Surgeon General's 1964 framework — applied Hill-like criteria to declare smoking causal for lung cancer; historically pivotal

Board pearl: Koch's postulates are for single-organism infectious causation; Bradford Hill is for population-level chronic disease and non-infectious exposures; Rothman's pies model multifactorial causation. Matching the framework to the question type is testable.

Key Differentials — Other Category (Statistical vs Causal Reasoning Pitfalls)

— A statistically non-significant finding (CI crosses null) may reflect insufficient power

— A significant finding may be a Type I error, especially with multiple comparisons

— Bonferroni correction or false discovery rate methods for multiple testing

— Most important non-causal explanation in observational research

— Residual / unmeasured confounding always possible — randomization is the only solution

— Cannot be fixed in analysis; must be prevented by design

— Always consider before accepting causation

— Particularly relevant in cross-sectional studies and biomarker associations

— Mediator: exposure → mediator → outcome (don't adjust if estimating total effect; do decompose for indirect effects)

— Confounder: not on pathway; adjust for it

— Effect modifier: changes magnitude of effect across subgroups; report stratified results

— Ice cream sales and drowning (confounder: summer)

— Storks and birth rates (rural areas)

— Hormone replacement therapy and CHD (observational studies suggested protective; WHI RCT showed harm — confounding by healthy-user bias)

— Surrogate endpoint correlation with clinical outcome does not guarantee that intervention effects on the surrogate translate to clinical benefit

— CAST trial: antiarrhythmics suppressed PVCs (surrogate) but increased mortality (clinical)

— Extreme values tend toward the mean on repeat measurement — can falsely suggest treatment effect

Distinguishing causation from non-causal alternatives is the heart of biostatistics on Step 3

Pure chance (random error)

Confounding (alternative explanation)

Bias (systematic error)

Reverse causation

Mediation vs confounding vs effect modification

Correlation ≠ causation classic traps

Surrogate vs clinical endpoints

Regression to the mean

Key distinction: Statistical significance (p<0.05) tells you the finding is unlikely due to chance; it tells you nothing about causation, effect size, or clinical importance. Conversely, a clinically important effect may fail to reach significance in an underpowered study. Step 3 rewards keeping these separate.

Secondary Prevention — Translating Established Causation into Practice

— Primordial: prevent risk factor development (built environment, food policy)

— Primary: prevent disease in those with risk factors (statins for hyperlipidemia, vaccines)

— Secondary: detect early disease (screening — mammography, colonoscopy)

— Tertiary: limit disability from established disease (cardiac rehab post-MI)

— Quaternary: prevent overmedicalization and iatrogenic harm

— Grade A/B: high/moderate certainty of net benefit → offer/provide

— Grade C: small net benefit → individualize

— Grade D: no benefit or net harm → discourage

— Grade I: insufficient evidence

— Underlying logic incorporates causal strength of exposure → outcome and intervention → outcome reduction

— Causation established (Hill, 1965)

— Cessation reverses risk over years (Experiment criterion satisfied)

— Step 3 management: assess at every visit, advise quit, assess readiness, assist (varenicline, bupropion, NRT, counseling), arrange follow-up (5 As)

— Causation of LDL → ASCVD supported by RCTs, Mendelian randomization, dose–response (LDL lowering → linear event reduction)

— Risk-based prescribing (10-yr ASCVD ≥7.5–20%)

— Diet, exercise, alcohol, sun protection — all grounded in causal epidemiology

Once causation is established via Hill-style reasoning, secondary prevention uses that knowledge for ongoing patient and population care

Levels of prevention

USPSTF grading and causation

Tobacco cessation — paradigmatic application

Statins for ASCVD prevention

Behavioral counseling

Step 3 management: On secondary prevention vignettes, the highest-yield interventions reflect causally validated exposures: BP control, lipid management, glycemic control, tobacco cessation, antiplatelet therapy post-ASCVD event, vaccinations, weight management. Order these systematically.

Follow-Up, Monitoring, and Ongoing Surveillance of Causal Evidence

— Rare AEs only detectable after widespread use (1 in 10,000)

— FDA MedWatch, FAERS, sentinel networks

— Hill criteria reapplied as signals emerge

— Continuously updated as new trials publish

— Cochrane reviews, GRADE updates

— HRT and CHD: observational studies suggested benefit, WHI RCT overturned → demonstrates limits of consistency criterion when underlying bias (healthy user) is shared across observational studies

— Saturated fat and CVD: ongoing re-evaluation with refined dietary epidemiology methods

— Statins: lipid panel, LFTs if symptomatic, CK if symptomatic

— Anticoagulants: INR (warfarin), renal function (DOACs), bleeding assessment

— Diabetes meds: A1c q3 months until at goal, then q6 months

— Antihypertensives: BP, K+, Cr (ACEi/ARB, diuretics)

— Shared decision-making: present causal evidence quality (RCT-derived vs observational), NNT/NNH, patient values

— Number needed to treat (NNT) translates causal effect into patient-level utility

— Absolute risk reduction is more meaningful than relative risk reduction for patients

— Frame risks in absolute terms and natural frequencies (5 of 1000 vs 0.5%)

— Avoid causal overstatement from weak observational data ("eggs cause heart disease")

Causation is not static — evidence accumulates and conclusions may shift

Post-marketing surveillance (Phase IV)

Living systematic reviews and meta-analyses

Re-examination of established causal claims

Monitoring parameters for established causal interventions

Counseling principles

Health literacy

Board pearl: NNT = 1 / ARR and NNH = 1 / ARI (absolute risk increase). When the question gives RR and event rates, calculate ARR yourself rather than relying on relative measures. Step 3 frequently tests NNT calculation as the practical endpoint of causal evidence translated into care.

Ethical, Legal, and Patient Safety Considerations

— Clinical equipoise (genuine uncertainty about which arm is better) is required to ethically conduct an RCT

— Loss of equipoise (interim analyses showing strong benefit or harm) triggers DSMB to consider stopping

— Disclosure of risks, benefits, alternatives, right to withdraw

— Special protections for vulnerable populations (children, prisoners, pregnant patients, cognitively impaired)

— Tuskegee, Henrietta Lacks, Willowbrook — historical violations driving modern IRB oversight

— Suspected occupational disease (varies by state — silicosis, asbestosis)

— Communicable diseases per state list

— Suspected child/elder/intimate partner abuse (causation of injury)

— Adverse vaccine events → VAERS

— Adverse drug events → FAERS / MedWatch (voluntary for clinicians, mandatory for manufacturers)

— Daubert standard governs admissibility of expert scientific testimony in U.S. federal court

— Courts increasingly use Hill-like criteria to assess causation in toxic tort cases (asbestos, talc, glyphosate)

— Legal "more likely than not" (preponderance) is a lower bar than scientific consensus

— Overstating causation harms autonomy (unnecessary anxiety, avoidance)

— Understating causation harms beneficence (preventable disease)

— Frame evidence quality honestly

— Adverse events disproportionately occur at care transitions (hospital → home, primary → specialist)

— Documenting suspected drug–event causation in transfer summaries prevents recurrent harm — failure to document a probable adverse drug reaction at discharge is a classic Step 3 patient safety vignette

Ethical issues in causal research

Equipoise

Informed consent for research

Mandatory reporting linked to causation

Legal causation vs scientific causation

Disclosure of weak causal evidence to patients

Transition-of-care risk

Step 3 management: When a patient develops a serious reaction (e.g., angioedema with ACEi), document the drug, reaction, and causal assessment in the allergy/adverse reaction list — not just the chart note — to prevent rechallenge by future providers. This is a high-yield patient safety action item.

High-Yield Associations and Rapid-Fire Clinical Facts

Memorize the nine Hill criteria: Strength, Consistency, Specificity, Temporality, Biological gradient (dose–response), Plausibility, Coherence, Experiment, Analogy

Only Temporality is universally required for causal inference

Specificity is the weakest criterion — most exposures cause multiple diseases

Strength + Consistency + Dose–response + Temporality = most compelling combination

Hill criteria were articulated in 1965 by Sir Austin Bradford Hill, building on the Doll & Hill smoking studies of the 1950s

Causation prerequisites: rule out chance, bias, confounding before invoking Hill

Confounder must be: (1) associated with exposure, (2) associated with outcome, (3) not on the causal pathway

Effect modification is biological reality to report, not statistical nuisance to adjust

Randomization controls confounding from both measured and unmeasured variables — its key advantage

Case-control → OR; cohort → RR, incidence; cross-sectional → prevalence

OR approximates RR when disease is rare (<10% prevalence)

NNT = 1/ARR; lower NNT = more efficient intervention

Number needed to harm (NNH) = 1/ARI; higher NNH = safer

Reverse causation is the classic alternative to temporality; Mendelian randomization mitigates it

DAGs: don't adjust for mediators or colliders

Ecological fallacy: group-level association ≠ individual-level association

Hill ≠ checklist — Hill explicitly warned against rigid application

Surgeon General's 1964 report is the historical landmark applying Hill-style reasoning to smoking

Koch's postulates for single infectious agents; Hill for chronic/non-infectious; Rothman for multifactorial

Doll, Hill, and Peto — the British smoking epidemiology trio worth recognizing in question stems

Board pearl: If a question lists a large RR (≥3), dose–response, replication across countries, plausibility, and reversibility with exposure removal — the answer regarding causation is almost always "causal relationship strongly supported", even without RCT evidence.

Board Question Stem Patterns

— Stem: "Patients exposed to chemical X had a 4-fold higher rate of disease Y; this pattern was observed in cohorts from three countries"

— Answer: Consistency (replication); not strength (which is the 4-fold)

— Trap: choosing strength because RR is mentioned — focus on the replication clause

— Stem: "In a cross-sectional survey, depressed adults reported higher alcohol use than non-depressed adults"

— Missing: Temporality — cross-sectional design cannot determine sequence

— Possibility of reverse causation (depression → drinking, or drinking → depression)

— Stem: "Researcher wants to determine if drug X causes rare hepatic failure"

— Answer: Case-control (rare outcome); RCT not feasible/ethical

— Stem: "Coffee drinkers had higher MI rates"

— Confounder: smoking (associated with both coffee consumption and MI, not on causal pathway)

— Stem: "Cases of lung cancer recalled occupational asbestos exposure more thoroughly than controls"

— Answer: Recall bias (information bias), not confounding

— OR 1.8, 95% CI 0.9–3.5 → not statistically significant; causal claim premature

— Given event rates of 5% (treatment) and 10% (control): ARR = 5%, NNT = 1/0.05 = 20

— Infectious agent novel pathogen → Koch/molecular Koch

— Environmental exposure chronic disease → Hill

— Strong, consistent, dose–responsive, temporal, plausible, but no RCT → public health action appropriate (smoking analogy)

Pattern 1 — Identify the criterion illustrated

Pattern 2 — Identify the violated/missing criterion

Pattern 3 — Best study design for a causal question

Pattern 4 — Most likely confounder

Pattern 5 — Distinguishing bias from confounding

Pattern 6 — Interpreting CI

Pattern 7 — NNT calculation

Pattern 8 — Selecting the framework

Pattern 9 — When to act despite imperfect evidence

CCS pearl: Watch for vignettes where the right answer is to order a public health notification (cluster of unusual cancers in an industrial town) rather than additional individual workup. Step 3 rewards integration of population-level reasoning into clinical decisions.

One-Line Recap

Bradford Hill's nine criteria — strength, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, and analogy — are heuristics (not a checklist) used to judge whether an established statistical association reflects true causation, after chance, bias, and confounding have already been excluded, with temporality being the only universally required element.

Apply only after a statistically valid association is demonstrated and chance, bias, and confounding have been addressed — Hill is the final inferential step, not the first

Temporality is mandatory; strength, consistency, biological gradient, and experiment carry the most inferential weight; specificity is frequently and acceptably violated because most exposures cause multiple outcomes

Match the framework to the question: Koch/molecular Koch for single infectious agents, Bradford Hill for chronic/environmental/non-infectious exposures, Rothman's component-cause pies for multifactorial disease, counterfactual/DAG frameworks for modern formal causal inference

Translate causation into action: established causal links drive USPSTF recommendations, FDA labeling, OSHA standards, and clinical guidelines — and at the bedside, into screening, prevention, pharmacotherapy, and counseling decisions quantified by NNT, NNH, and absolute risk reduction

Board pearl: When the stem describes a large, consistent, dose–responsive, temporally clear, biologically plausible association whose reversal follows exposure removal — call it causal even without RCT evidence; this mirrors the historical Surgeon General's reasoning on tobacco and remains the modern public health standard for non-randomizable harmful exposures.