top of page

Eduovisual

Biostatistics & Population Health

Causation criteria: Bradford Hill

Clinical Overview and When to Suspect Causation vs Association

— Originally articulated in his presidential address to the Royal Society of Medicine on the environment–disease relationship (smoking and lung cancer was the motivating case)

— Not a checklist, not a scoring system, not a statistical test — they are heuristics for inference applied after an association has been demonstrated

— Interpreting an epidemiologic study (cohort, case-control) that reports an exposure–disease link

— Public health / occupational medicine vignettes (lead, asbestos, vaping, opioid prescribing patterns)

— Pharmacovigilance signals (drug A → adverse event B)

— Distinguishing correlation from causation when bias, confounding, or chance could explain findings

— A valid statistical association must first be established

Chance ruled out (p-value, confidence interval)

Bias addressed (selection, information, recall)

Confounding addressed (stratification, multivariable regression, matching)

— Only then is it meaningful to ask: is this causal?

— Mnemonic: "Strong Coffee Should Taste Better, Please Consider Extra Aroma"

Bradford Hill criteria (Sir Austin Bradford Hill, 1965) are a framework of nine viewpoints used to judge whether an observed statistical association between an exposure and an outcome reflects a causal relationship
When the framework is invoked on Step 3
Prerequisites before applying Hill's viewpoints
The nine criteria (memorize): Strength, Consistency, Specificity, Temporality, Biological gradient (dose–response), Plausibility, Coherence, Experiment, Analogy
Board pearl: Only temporality (exposure must precede outcome) is considered absolutely required for causation. All others strengthen the argument but none alone is necessary or sufficient. A study can demonstrate causation without satisfying every criterion — and satisfying many criteria does not prove causation, only supports the inference.
Solid White Background
Presentation Patterns and Key History — How Hill Criteria Appear in Vignettes

— A study is described (often cohort or case-control), an RR/OR is given, and the examinee is asked which Hill criterion is best illustrated, violated, or missing

— Alternatively, a public health official or clinician must decide whether an exposure–outcome link is "causal enough" to act on

Strength: "Smokers had a 20-fold increase in lung cancer" → large RR/OR favors causation; small effects more vulnerable to residual confounding

Consistency: "Findings replicated across multiple populations, study designs, and investigators" → reproducibility

Specificity: "Exposure causes one specific disease in a specific population" → weakest criterion (most exposures cause multiple outcomes; smoking causes many cancers)

Temporality: "Exposure clearly preceded outcome onset" → the only mandatory criterion

Biological gradient: "Higher dose or longer duration → higher risk" (dose–response)

Plausibility: Fits known biological mechanism

Coherence: Does not conflict with known facts about the disease's natural history and biology

Experiment: Removing exposure reduces incidence (smoking cessation programs ↓ lung cancer); or RCT evidence

Analogy: Similar exposures cause similar effects (thalidomide → suspicion of other teratogens)

— Duration and intensity of exposure → biological gradient

— Sequence of exposure and symptom onset → temporality

— Family/community clustering with same exposure → consistency

Typical Step 3 stem structure
Recognizing each criterion in prose
History elements that map to criteria
Key distinction: Specificity vs ConsistencySpecificity asks "does this exposure cause one disease?" (often violated and acceptably so). Consistency asks "do multiple studies show the same association?" Students confuse these constantly on exam day.
Solid White Background
Physical Exam Findings — Translating to "Exam" of the Evidence Base

— RR ≥ 2–3 generally considered "strong"; RR 1.1–1.5 considered weak and easily explained by unmeasured confounding

— Hill's smoking–lung cancer example: RR ~9–20 depending on intensity

— Weak associations are not non-causal — they simply require more rigorous control of bias

— Check for meta-analyses, systematic reviews, multi-country cohorts

— Heterogeneity (I²) helps quantify inconsistency

— Watch for publication bias — funnel plot asymmetry weakens the consistency argument

— Graded exposure (pack-years, mSv, mg/day) should yield graded risk

Threshold effects and non-monotonic curves (J-shape for alcohol/CV mortality) complicate but don't refute causation

— Absence of dose–response is a red flag, especially for chemical or radiation exposures

— In cross-sectional studies, temporality is often unknowable → causal inference is severely limited

Cohort studies establish temporality best; case-control studies are vulnerable to recall and reverse causation

— Plausibility = fits known biology/mechanism

— Coherence = fits known epidemiology and natural history of disease

— Both are limited by current scientific knowledge — early in a discovery, plausibility may be low yet the association still causal (e.g., H. pylori and PUD initially dismissed)

Since Bradford Hill is an epidemiologic framework, the "physical exam" analog is the critical appraisal of an association — the structured inspection of study features that either strengthen or weaken causal inference
Inspection of effect size (Strength)
Palpation of replication (Consistency)
Auscultation of dose–response (Biological gradient)
Percussion of temporality
Vitals of plausibility & coherence
Board pearl: A large, consistent, dose–responsive association with clear temporality is the strongest combination. Examiners reward recognizing all four together as a near-definitive causal signal.
Solid White Background
Diagnostic Workup — Establishing the Statistical Association First

— Compute measure of effect appropriate to study design:

Cohort studyrelative risk (RR) = incidence in exposed / incidence in unexposed

Case-control studyodds ratio (OR) as estimate of RR (valid when disease is rare, <10%)

Cross-sectional studyprevalence ratio or prevalence OR

RCTRR or absolute risk reduction (ARR), NNT = 1/ARR

p-value < 0.05 conventionally

95% CI for RR/OR — if it crosses 1.0, association is not statistically significant

— Wide CI = imprecise (small sample, low power)

Type I error (α) = false positive; Type II (β) = false negative; power = 1 − β, typically ≥ 0.80

Selection bias (Berkson, healthy worker effect, non-response)

Information bias (recall, interviewer, misclassification)

Lead-time / length-time bias in screening studies

— A confounder is associated with both exposure and outcome, and is not on the causal pathway

— Address by: randomization (design), restriction, matching (design), stratification, multivariable regression, propensity scores (analysis)

Effect modification (interaction) ≠ confounding — it should be reported, not adjusted away

Step 1: Confirm a real association exists before applying Hill at all
Step 2: Rule out chance
Step 3: Rule out bias
Step 4: Rule out confounding
Step 5: Only now apply Hill criteria to judge causation
Step 3 management: When a stem reports an OR with CI 0.9–2.5, the first move is not to invoke Hill — it is to recognize the association is not statistically significant (CI crosses 1) and therefore causal inference is premature. Examiners trap test-takers who jump straight to causation language.
Solid White Background
Diagnostic Workup — Advanced Causal Inference Tools Beyond Hill

— Causation defined as: outcome in the exposed differs from what would have occurred in the same individuals had they been unexposed

— RCTs approximate this through randomization, which balances measured and unmeasured confounders

— Visual tool to identify confounders, mediators, and colliders

Mediator: on causal pathway (don't adjust if estimating total effect)

Collider: common effect of two variables — adjusting for a collider induces bias (collider-stratification bias)

— Uses genetic variants as instrumental variables (random at meiosis → mimics randomization)

— Strengthens causal inference for exposures like LDL, BMI, alcohol

— Systematic review/meta-analysis of RCTs > single RCT > cohort > case-control > cross-sectional > case series > expert opinion

— RCT is the gold standard because randomization handles unmeasured confounding

— Specificity often inappropriate (smoking causes many cancers; one cause → many outcomes is the norm)

— Plausibility is knowledge-dependent and can be circular

— No weighting between criteria

Modern epidemiology supplements Hill with formal causal frameworks — increasingly tested on Step 3 biostatistics blocks
Counterfactual framework
Directed acyclic graphs (DAGs)
Mendelian randomization
Hierarchy of evidence (causal strength)
Bradford Hill's "Experiment" criterion maps directly to this hierarchy — if removing the exposure (cessation campaign, regulation, randomized intervention) reduces disease incidence, causation is strongly supported
Limitations of Hill in modern context
Board pearl: Examiners increasingly test the concept that randomized trials are not always feasible or ethical (you cannot randomize people to smoke). In such cases, Hill's framework + DAG-informed observational analysis + Mendelian randomization together provide the best available causal evidence — and public health action proceeds without an RCT (e.g., the 1964 Surgeon General's report).
Solid White Background
Risk Stratification — Weighting the Nine Criteria in Practice

Temporality: Exposure must precede outcome. Without it, no causal claim is possible. Reverse causation (disease causes the "exposure") is the classic alternative explanation

— Example violation: low cholesterol "causes" cancer — but occult cancer may lower cholesterol (reverse causation)

Strength: Large RR/OR harder to explain by residual confounding

Biological gradient: Dose–response is mechanistically reassuring

Consistency: Replication across populations, designs, investigators

Experiment: Intervention reverses outcome (cessation, regulation, RCT)

Plausibility: Depends on current biological knowledge

Coherence: Compatible with natural history of disease

Analogy: Similar exposures cause similar effects

Specificity: Most exposures cause multiple outcomes; most outcomes have multiple causes — violating specificity is the norm, not evidence against causation

— Identify which criteria the study demonstrates, which it fails, and which are unaddressable by the study design

— Cross-sectional studies cannot demonstrate temporality

— Single-study results cannot demonstrate consistency

— Animal models contribute plausibility but not human consistency

— Strong, consistent, dose–responsive association with temporality and plausibility → act, even without RCT (smoking, asbestos, lead)

— Weak, inconsistent association → demand more evidence before policy/clinical change

Tier 1 — Essentially required
Tier 2 — Strongly supportive when present
Tier 3 — Supportive but limited
Tier 4 — Often violated, low weight
Applying to a vignette
Decision threshold for public health action
Key distinction: Necessary cause (without it, disease cannot occur — e.g., HIV for AIDS) vs sufficient cause (alone produces disease) vs component cause (one of several factors required together — Rothman's causal pies). Hill's criteria address probabilistic causation in populations, not deterministic individual-level causation.
Solid White Background
Pharmacotherapy Analog — Applying Hill to Drug-Adverse Event Signals

— Temporal relationship of event to drug administration

— Improvement on dechallenge (drug stopped → event resolves) → supports causation = Hill's "Experiment"

— Recurrence on rechallenge → strong causal evidence (rarely ethical)

— Dose–response → biological gradient

— Plausibility from pharmacology

— Alternative explanations excluded (confounding by indication)

— Patients prescribed drug X differ systematically from those not prescribed

— Example: antidepressants and suicide — depression itself is the confounder

— Addressed by active comparator design, propensity score matching, instrumental variables

— Sicker patients channeled to newer/safer-perceived drugs

— Can create spurious associations in either direction

— SSRI use and GI bleeding

— Temporality (use precedes bleed) ✓

— Strength (OR ~2) — moderate

— Consistency (multiple cohorts) ✓

— Dose–response (higher serotonin affinity → higher risk) ✓

— Plausibility (platelets depend on serotonin uptake) ✓

— Experiment (discontinuation reduces risk) ✓

— Conclusion: causal — labeling and clinical guidance reflect this

Pharmacovigilance is a major real-world application: deciding whether a drug actually causes an adverse event reported in spontaneous reports (FAERS, VAERS), case series, or observational studies
Naranjo Adverse Drug Reaction Probability Scale — a clinical-level Hill analog
Confounding by indication — the major bias in drug–outcome observational studies
Channeling bias
Hill applied to a drug safety signal — worked example
Step 3 management: When a vignette describes a temporal association between a newly started drug and an adverse event, the immediate step is dechallenge (stop the drug, observe). Rechallenge is generally not pursued for serious reactions (Stevens-Johnson, anaphylaxis, agranulocytosis) for safety reasons — recognizing this ethical boundary is itself testable.
Solid White Background
Procedures / Study Designs — Matching Design to Causal Question

— Randomization balances measured and unmeasured confounders

— Blinding minimizes information bias

Limits: cost, ethics (can't randomize to harm), generalizability (efficacy vs effectiveness), short follow-up

— Best for establishing Experiment criterion directly

— Exposure measured before outcome → satisfies temporality cleanly

— Calculates incidence and RR

— Vulnerable to loss to follow-up, confounding

— Best for common outcomes, rare exposures, long latency diseases (Framingham, Nurses' Health Study)

— Exposure and outcome both already occurred; investigator looks back through records

— Faster, cheaper; risk of incomplete data

— Best for rare outcomes (cancers, rare AEs)

— Selects on outcome → cannot calculate incidence, only OR

— Vulnerable to recall bias and selection bias

— Snapshot — measures prevalence, not incidence

— Cannot establish temporality → poor for causation

— Group-level data → risk of ecological fallacy (group-level association ≠ individual-level association)

— Useful for hypothesis generation only

— Interrupted time series, difference-in-differences, regression discontinuity — strengthen causal inference when RCT impossible (policy evaluation)

Hierarchy of study designs for causal inference (strongest → weakest)
Randomized controlled trial (RCT)
Prospective cohort
Retrospective cohort
Case-control
Cross-sectional
Ecological
Quasi-experimental designs
CCS pearl: When a Step 3 question asks "what is the best study design to determine if exposure X causes disease Y," the answer hierarchy is: RCT > prospective cohort > retrospective cohort > case-control > cross-sectional, modified by feasibility. For a rare disease, even though RCT is theoretically stronger, case-control is the practical answer because RCT/cohort would require impossibly large samples.
Solid White Background
Special Populations — Subgroup Analyses, Effect Modification, and Generalizability

— A causal relationship demonstrated in one population may not transfer

— Hill's Consistency criterion partially addresses this — replication across diverse populations strengthens generalizability

— Magnitude of effect differs across subgroups (age, sex, genotype, comorbidity)

— Example: oral contraceptives + smoking → multiplicative VTE risk

— Effect modification should be reported and stratified, not adjusted away

— Contrasts with confounding, which should be controlled

— Often excluded from RCTs → causal inferences may not apply

— Competing risks (death from other causes) complicate outcome ascertainment

Confounding by frailty/indication is severe

— Pharmacokinetic differences mean drug–outcome associations may have different dose–response curves in these groups

— Hill's biological gradient must be re-examined per subgroup

— Causation established in adults often extrapolated to children — but mechanisms (developmental biology) may differ

— FDA pediatric extrapolation framework formally evaluates whether adult causal evidence applies

— Almost universally excluded from RCTs → causation for pregnancy outcomes often relies on registries, case-control, and cohort studies

— Teratogenicity assessment leans heavily on Hill (thalidomide: strong, consistent, dose–responsive, temporal, plausible, analogous)

— Pharmacogenomics (CYP2C19 and clopidogrel, HLA-B*5701 and abacavir) → effect modification at genotype level

— Single-ancestry studies limit consistency claims

Generalizability (external validity) is a critical layer atop Hill criteria
Effect modification (interaction)
Elderly populations
Renal/hepatic impairment
Pediatric extrapolation
Pregnancy
Genetic/ancestry subgroups
Board pearl: When a study population is narrow (e.g., middle-aged white male veterans), examiners reward identifying that causal conclusions may apply only to similar populations. Generalizing without supporting evidence from diverse cohorts violates Hill's Consistency in spirit and is a common trap in question stems about applying trial results to your specific patient.
Solid White Background
Special Populations — Public Health, Environmental, and Occupational Causation

Smoking and lung cancer (Doll & Hill, 1950s): strong RR, dose–response with pack-years, consistent across countries, plausible (carcinogens in smoke), reversible with cessation (experiment), analogous to other tobacco cancers

Asbestos and mesothelioma: near-specific exposure for a near-specific disease (rare instance where specificity holds)

Lead and pediatric cognitive deficits: dose–response, consistency, plausibility, experiment (lead abatement → improved scores)

Aspirin and Reye syndrome: temporal, strong, experiment (warning labels → ↓incidence)

H. pylori and PUD/gastric cancer (Marshall): initially low plausibility but strong other criteria; experiment (eradication cures) clinched causation

— Workers are healthier than general population at baseline → biases occupational cohort studies toward the null

— Use internal comparisons (high-exposure vs low-exposure workers) rather than general-population SMRs

— Many environmental/occupational exposures have decades-long latency (asbestos → mesothelioma 20–40 yr)

— Studies must have adequate follow-up; short studies may falsely reject causation

— Certain causal links trigger public health reporting (occupational lead, communicable disease, suspected cluster of cancers)

— Clinicians have a role in surveillance

— When evidence is suggestive but not conclusive, public health may act to limit exposure (BPA, PFAS) — a policy decision that goes beyond strict Hill satisfaction

Occupational and environmental epidemiology are the historical home of Hill criteria — Hill himself worked in occupational medicine
Classic causation cases solved by Hill-style reasoning
Healthy worker effect
Latency
Mandatory reporting
Precautionary principle
Step 3 management: A vignette describing a cluster of cancers in a workplace warrants notification of OSHA, NIOSH, and state/local health departments — recognizing the system-level response is testable on Step 3 community medicine items.
Solid White Background
Complications — Common Errors in Causal Inference

— Outcome causes the apparent exposure

— Example: physical inactivity "causes" obesity vs obesity causes inactivity

— Mitigated by: prospective design, lag analyses, Mendelian randomization

— Third variable associated with both exposure and outcome

— Classic: coffee and lung cancer (smoking confounds)

— Mitigated by: randomization, restriction, matching, stratification, regression, propensity scores

— Berkson's bias (hospitalized controls), non-response, loss to follow-up, healthy-worker

— Distorts the apparent association in unpredictable directions

Differential: misclassification differs by group → biases toward or away from null unpredictably

Non-differential: equal across groups → typically biases toward the null

— Cases remember exposures differently than controls (case-control studies)

— Mitigated by structured interviews, records-based exposure ascertainment

— Inferring individual-level causation from group-level data

— Famous example: per capita fat intake and breast cancer correlate at country level but not at individual level

— Opposite error: inferring population effects from individual-level data alone

— Pharmacoepidemiology error: misclassifying time before drug initiation, falsely favoring "treated" group survival

— Screening studies — apparent survival benefit from earlier detection, not actual mortality reduction

— Using criteria as a checklist with scoring (Hill explicitly warned against this)

— Demanding all 9 criteria be met before accepting causation

— Treating specificity as essential (it usually isn't)

Reverse causation
Confounding
Selection bias
Information bias / misclassification
Recall bias
Ecological fallacy
Atomistic fallacy
Immortal time bias
Lead-time and length-time bias
Hill criteria misapplication
Key distinction: Bias vs Confoundingbias is a systematic error in study design or measurement (can't be fixed in analysis); confounding is a true alternative explanation that can be statistically adjusted if measured. Step 3 questions love this distinction.
Solid White Background
When to Escalate — From Association to Action and Policy

— Strong, consistent causal evidence + favorable risk-benefit + alternatives considered → change practice

— Example: COX-2 inhibitors and CV events (rofecoxib withdrawal 2004)

— Systematic reviews and meta-analyses synthesizing Hill-satisfied evidence → guideline committees (USPSTF, AHA/ACC, ADA) update recommendations

GRADE framework formalizes evidence quality and recommendation strength

— FDA black-box warnings, drug withdrawals

— EPA exposure limits

— OSHA workplace standards

— Often requires lower causal certainty than clinical practice change because of population-scale stakes

— Strong temporal, consistent association during an outbreak may trigger action before full Hill satisfaction (precautionary principle)

— Example: Legionnaires' disease (1976) — cooling tower link acted on before full mechanistic confirmation

— Vaping-associated lung injury (EVALI, 2019) — vitamin E acetate identified rapidly using case-control and dechallenge logic

— Adverse event noticed → MedWatch/FAERS report

— Cluster noticed → notify public health department

— Patient harm from systems issue → root cause analysis, patient safety officer

— Epidemiologist / biostatistician for study design and analysis

— Public health authority for cluster investigation

— Risk management / legal for potentially preventable harms

Threshold for individual clinical action
Threshold for guideline change
Threshold for regulatory action
Threshold for public health emergency
Escalation pathway in the clinic
Consultation triggers
CCS pearl: On CCS-style vignettes involving suspected disease clusters or environmental exposures, advancing the clock to notify the local/state health department is virtually always a correct early action — alongside symptomatic management and exposure documentation. The Step 3 examiner expects integrated clinical + public health thinking.
Solid White Background
Key Differentials — Same Category (Alternative Causal Frameworks)

— Each disease has multiple sufficient causes, each composed of component causes

— A component is necessary if it appears in every sufficient cause (e.g., HIV in AIDS)

— Useful for understanding multifactorial disease (CAD, cancer)

— Reframes "the cause" as a set of interacting components, not a single agent

— Causation = difference between observed outcome and counterfactual outcome had exposure differed

— Foundation of modern causal inference, RCTs, propensity scores

Average treatment effect (ATE) and average treatment effect on the treated (ATT) are formal estimands

— Graphical representation of causal assumptions

— Identifies confounders to adjust, mediators to leave alone, colliders to avoid

— Formalizes when an observational study can estimate a causal effect (back-door criterion)

— Historical framework for infectious causation:

— Organism present in all cases

— Isolated and grown in pure culture

— Reproduces disease when inoculated

— Re-isolated from new host

— Limited in modern era (asymptomatic carriers, non-culturable organisms, viruses, multifactorial disease) → updated with molecular Koch's postulates (Falkow)

Bradford Hill (1965) is the most cited, but several alternative or complementary frameworks exist for causal inference — examiners may test recognition
Rothman's component cause model (causal pies, 1976)
Counterfactual / potential outcomes framework (Rubin, Neyman)
Structural causal models / DAGs (Pearl)
Koch's postulates (microbiology)
Henle-Koch postulates updated for viruses (Rivers, 1937)
Surgeon General's 1964 framework — applied Hill-like criteria to declare smoking causal for lung cancer; historically pivotal
Board pearl: Koch's postulates are for single-organism infectious causation; Bradford Hill is for population-level chronic disease and non-infectious exposures; Rothman's pies model multifactorial causation. Matching the framework to the question type is testable.
Solid White Background
Key Differentials — Other Category (Statistical vs Causal Reasoning Pitfalls)

— A statistically non-significant finding (CI crosses null) may reflect insufficient power

— A significant finding may be a Type I error, especially with multiple comparisons

Bonferroni correction or false discovery rate methods for multiple testing

— Most important non-causal explanation in observational research

— Residual / unmeasured confounding always possible — randomization is the only solution

— Cannot be fixed in analysis; must be prevented by design

— Always consider before accepting causation

— Particularly relevant in cross-sectional studies and biomarker associations

Mediator: exposure → mediator → outcome (don't adjust if estimating total effect; do decompose for indirect effects)

Confounder: not on pathway; adjust for it

Effect modifier: changes magnitude of effect across subgroups; report stratified results

— Ice cream sales and drowning (confounder: summer)

— Storks and birth rates (rural areas)

— Hormone replacement therapy and CHD (observational studies suggested protective; WHI RCT showed harm — confounding by healthy-user bias)

— Surrogate endpoint correlation with clinical outcome does not guarantee that intervention effects on the surrogate translate to clinical benefit

— CAST trial: antiarrhythmics suppressed PVCs (surrogate) but increased mortality (clinical)

— Extreme values tend toward the mean on repeat measurement — can falsely suggest treatment effect

Distinguishing causation from non-causal alternatives is the heart of biostatistics on Step 3
Pure chance (random error)
Confounding (alternative explanation)
Bias (systematic error)
Reverse causation
Mediation vs confounding vs effect modification
Correlation ≠ causation classic traps
Surrogate vs clinical endpoints
Regression to the mean
Key distinction: Statistical significance (p<0.05) tells you the finding is unlikely due to chance; it tells you nothing about causation, effect size, or clinical importance. Conversely, a clinically important effect may fail to reach significance in an underpowered study. Step 3 rewards keeping these separate.
Solid White Background
Secondary Prevention — Translating Established Causation into Practice

Primordial: prevent risk factor development (built environment, food policy)

Primary: prevent disease in those with risk factors (statins for hyperlipidemia, vaccines)

Secondary: detect early disease (screening — mammography, colonoscopy)

Tertiary: limit disability from established disease (cardiac rehab post-MI)

Quaternary: prevent overmedicalization and iatrogenic harm

— Grade A/B: high/moderate certainty of net benefit → offer/provide

— Grade C: small net benefit → individualize

— Grade D: no benefit or net harm → discourage

— Grade I: insufficient evidence

— Underlying logic incorporates causal strength of exposure → outcome and intervention → outcome reduction

— Causation established (Hill, 1965)

— Cessation reverses risk over years (Experiment criterion satisfied)

— Step 3 management: assess at every visit, advise quit, assess readiness, assist (varenicline, bupropion, NRT, counseling), arrange follow-up (5 As)

— Causation of LDL → ASCVD supported by RCTs, Mendelian randomization, dose–response (LDL lowering → linear event reduction)

— Risk-based prescribing (10-yr ASCVD ≥7.5–20%)

— Diet, exercise, alcohol, sun protection — all grounded in causal epidemiology

Once causation is established via Hill-style reasoning, secondary prevention uses that knowledge for ongoing patient and population care
Levels of prevention
USPSTF grading and causation
Tobacco cessation — paradigmatic application
Statins for ASCVD prevention
Behavioral counseling
Step 3 management: On secondary prevention vignettes, the highest-yield interventions reflect causally validated exposures: BP control, lipid management, glycemic control, tobacco cessation, antiplatelet therapy post-ASCVD event, vaccinations, weight management. Order these systematically.
Solid White Background
Follow-Up, Monitoring, and Ongoing Surveillance of Causal Evidence

— Rare AEs only detectable after widespread use (1 in 10,000)

— FDA MedWatch, FAERS, sentinel networks

— Hill criteria reapplied as signals emerge

— Continuously updated as new trials publish

— Cochrane reviews, GRADE updates

— HRT and CHD: observational studies suggested benefit, WHI RCT overturned → demonstrates limits of consistency criterion when underlying bias (healthy user) is shared across observational studies

— Saturated fat and CVD: ongoing re-evaluation with refined dietary epidemiology methods

— Statins: lipid panel, LFTs if symptomatic, CK if symptomatic

— Anticoagulants: INR (warfarin), renal function (DOACs), bleeding assessment

— Diabetes meds: A1c q3 months until at goal, then q6 months

— Antihypertensives: BP, K+, Cr (ACEi/ARB, diuretics)

— Shared decision-making: present causal evidence quality (RCT-derived vs observational), NNT/NNH, patient values

Number needed to treat (NNT) translates causal effect into patient-level utility

Absolute risk reduction is more meaningful than relative risk reduction for patients

— Frame risks in absolute terms and natural frequencies (5 of 1000 vs 0.5%)

— Avoid causal overstatement from weak observational data ("eggs cause heart disease")

Causation is not static — evidence accumulates and conclusions may shift
Post-marketing surveillance (Phase IV)
Living systematic reviews and meta-analyses
Re-examination of established causal claims
Monitoring parameters for established causal interventions
Counseling principles
Health literacy
Board pearl: NNT = 1 / ARR and NNH = 1 / ARI (absolute risk increase). When the question gives RR and event rates, calculate ARR yourself rather than relying on relative measures. Step 3 frequently tests NNT calculation as the practical endpoint of causal evidence translated into care.
Solid White Background
Ethical, Legal, and Patient Safety Considerations

— Clinical equipoise (genuine uncertainty about which arm is better) is required to ethically conduct an RCT

— Loss of equipoise (interim analyses showing strong benefit or harm) triggers DSMB to consider stopping

— Disclosure of risks, benefits, alternatives, right to withdraw

— Special protections for vulnerable populations (children, prisoners, pregnant patients, cognitively impaired)

— Tuskegee, Henrietta Lacks, Willowbrook — historical violations driving modern IRB oversight

— Suspected occupational disease (varies by state — silicosis, asbestosis)

— Communicable diseases per state list

— Suspected child/elder/intimate partner abuse (causation of injury)

— Adverse vaccine events → VAERS

— Adverse drug events → FAERS / MedWatch (voluntary for clinicians, mandatory for manufacturers)

Daubert standard governs admissibility of expert scientific testimony in U.S. federal court

— Courts increasingly use Hill-like criteria to assess causation in toxic tort cases (asbestos, talc, glyphosate)

— Legal "more likely than not" (preponderance) is a lower bar than scientific consensus

— Overstating causation harms autonomy (unnecessary anxiety, avoidance)

— Understating causation harms beneficence (preventable disease)

— Frame evidence quality honestly

— Adverse events disproportionately occur at care transitions (hospital → home, primary → specialist)

— Documenting suspected drug–event causation in transfer summaries prevents recurrent harm — failure to document a probable adverse drug reaction at discharge is a classic Step 3 patient safety vignette

Ethical issues in causal research
Equipoise
Informed consent for research
Mandatory reporting linked to causation
Legal causation vs scientific causation
Disclosure of weak causal evidence to patients
Transition-of-care risk
Step 3 management: When a patient develops a serious reaction (e.g., angioedema with ACEi), document the drug, reaction, and causal assessment in the allergy/adverse reaction list — not just the chart note — to prevent rechallenge by future providers. This is a high-yield patient safety action item.
Solid White Background
High-Yield Associations and Rapid-Fire Clinical Facts
Memorize the nine Hill criteria: Strength, Consistency, Specificity, Temporality, Biological gradient (dose–response), Plausibility, Coherence, Experiment, Analogy
Only Temporality is universally required for causal inference
Specificity is the weakest criterion — most exposures cause multiple diseases
Strength + Consistency + Dose–response + Temporality = most compelling combination
Hill criteria were articulated in 1965 by Sir Austin Bradford Hill, building on the Doll & Hill smoking studies of the 1950s
Causation prerequisites: rule out chance, bias, confounding before invoking Hill
Confounder must be: (1) associated with exposure, (2) associated with outcome, (3) not on the causal pathway
Effect modification is biological reality to report, not statistical nuisance to adjust
Randomization controls confounding from both measured and unmeasured variables — its key advantage
Case-control → OR; cohort → RR, incidence; cross-sectional → prevalence
OR approximates RR when disease is rare (<10% prevalence)
NNT = 1/ARR; lower NNT = more efficient intervention
Number needed to harm (NNH) = 1/ARI; higher NNH = safer
Reverse causation is the classic alternative to temporality; Mendelian randomization mitigates it
DAGs: don't adjust for mediators or colliders
Ecological fallacy: group-level association ≠ individual-level association
Hill ≠ checklist — Hill explicitly warned against rigid application
Surgeon General's 1964 report is the historical landmark applying Hill-style reasoning to smoking
Koch's postulates for single infectious agents; Hill for chronic/non-infectious; Rothman for multifactorial
Doll, Hill, and Peto — the British smoking epidemiology trio worth recognizing in question stems
Board pearl: If a question lists a large RR (≥3), dose–response, replication across countries, plausibility, and reversibility with exposure removal — the answer regarding causation is almost always "causal relationship strongly supported", even without RCT evidence.
Solid White Background
Board Question Stem Patterns

— Stem: "Patients exposed to chemical X had a 4-fold higher rate of disease Y; this pattern was observed in cohorts from three countries"

— Answer: Consistency (replication); not strength (which is the 4-fold)

— Trap: choosing strength because RR is mentioned — focus on the replication clause

— Stem: "In a cross-sectional survey, depressed adults reported higher alcohol use than non-depressed adults"

— Missing: Temporality — cross-sectional design cannot determine sequence

— Possibility of reverse causation (depression → drinking, or drinking → depression)

— Stem: "Researcher wants to determine if drug X causes rare hepatic failure"

— Answer: Case-control (rare outcome); RCT not feasible/ethical

— Stem: "Coffee drinkers had higher MI rates"

— Confounder: smoking (associated with both coffee consumption and MI, not on causal pathway)

— Stem: "Cases of lung cancer recalled occupational asbestos exposure more thoroughly than controls"

— Answer: Recall bias (information bias), not confounding

— OR 1.8, 95% CI 0.9–3.5 → not statistically significant; causal claim premature

— Given event rates of 5% (treatment) and 10% (control): ARR = 5%, NNT = 1/0.05 = 20

— Infectious agent novel pathogen → Koch/molecular Koch

— Environmental exposure chronic disease → Hill

— Strong, consistent, dose–responsive, temporal, plausible, but no RCT → public health action appropriate (smoking analogy)

Pattern 1 — Identify the criterion illustrated
Pattern 2 — Identify the violated/missing criterion
Pattern 3 — Best study design for a causal question
Pattern 4 — Most likely confounder
Pattern 5 — Distinguishing bias from confounding
Pattern 6 — Interpreting CI
Pattern 7 — NNT calculation
Pattern 8 — Selecting the framework
Pattern 9 — When to act despite imperfect evidence
CCS pearl: Watch for vignettes where the right answer is to order a public health notification (cluster of unusual cancers in an industrial town) rather than additional individual workup. Step 3 rewards integration of population-level reasoning into clinical decisions.
Solid White Background
One-Line Recap

Bradford Hill's nine criteria — strength, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, and analogy — are heuristics (not a checklist) used to judge whether an established statistical association reflects true causation, after chance, bias, and confounding have already been excluded, with temporality being the only universally required element.

Apply only after a statistically valid association is demonstrated and chance, bias, and confounding have been addressed — Hill is the final inferential step, not the first
Temporality is mandatory; strength, consistency, biological gradient, and experiment carry the most inferential weight; specificity is frequently and acceptably violated because most exposures cause multiple outcomes
Match the framework to the question: Koch/molecular Koch for single infectious agents, Bradford Hill for chronic/environmental/non-infectious exposures, Rothman's component-cause pies for multifactorial disease, counterfactual/DAG frameworks for modern formal causal inference
Translate causation into action: established causal links drive USPSTF recommendations, FDA labeling, OSHA standards, and clinical guidelines — and at the bedside, into screening, prevention, pharmacotherapy, and counseling decisions quantified by NNT, NNH, and absolute risk reduction
Board pearl: When the stem describes a large, consistent, dose–responsive, temporally clear, biologically plausible association whose reversal follows exposure removal — call it causal even without RCT evidence; this mirrors the historical Surgeon General's reasoning on tobacco and remains the modern public health standard for non-randomizable harmful exposures.
Solid White Background
bottom of page