Biostatistics & Population Health
Confounding and effect modification
— Classic example: coffee drinking appears to cause lung cancer, but smokers drink more coffee and smoking causes lung cancer → smoking confounds the coffee–cancer link.
— Confounding produces a biased single estimate that misleads the clinician or policymaker.
— Example: aspirin reduces MI risk more in men than in women → sex modifies the aspirin effect.
— Effect modification is not bias — it is information to be reported, not adjusted away.
— Observational study (cohort, case-control, cross-sectional) reports an association between exposure and outcome.
— Crude (unadjusted) RR/OR differs meaningfully from adjusted (stratified or multivariable) RR/OR.
— A plausible third variable is linked to both exposure and outcome.
— Stratum-specific estimates differ from each other (e.g., RR = 1.2 in women, RR = 3.4 in men).
— Subgroup analyses in an RCT show heterogeneity of treatment effect.

— "In a cohort study, alcohol use was associated with esophageal cancer (RR 3.5). After adjustment for tobacco use, RR fell to 1.2."
— Diagnosis: tobacco confounded the alcohol–cancer association.
— "Among non-smokers, OR for OCP and MI = 1.1. Among smokers, OR = 8.7."
— Diagnosis: smoking is an effect modifier of the OCP–MI relationship (synergistic interaction).
— "Drug X reduced stroke in patients <65 (HR 0.6) but not in patients ≥65 (HR 1.0)."
— Diagnosis: age modifies treatment effect; report separately.
— Watch for stems that look like confounding but are actually selection bias (e.g., Berkson's) or reverse causation. The hallmark of confounding is a pre-existing third variable, not a sampling artifact.
— Words like "after controlling for…," "stratified by…," "adjusted analysis revealed…" → confounding framework.
— Words like "the effect was stronger in…," "differed by…," "interaction p = 0.01" → effect modification.
— Age, sex, smoking, SES, BMI, comorbidities, baseline disease severity, healthcare access.

— Calculate crude RR (cohort) or OR (case-control).
— RR = [a/(a+b)] / [c/(c+d)]; OR = (ad)/(bc).
— Record this number — it is the "vital sign" you will compare against.
— Calculate stratum-specific RR or OR in each stratum.
— If stratum-specific estimates are similar to each other but differ from the crude estimate → confounding.
– Example: crude OR 3.0; stratum 1 OR 1.2; stratum 2 OR 1.3 → confounded; adjusted (Mantel-Haenszel) OR ≈ 1.2.
— If stratum-specific estimates differ meaningfully from each other → effect modification.
– Example: stratum 1 OR 1.1; stratum 2 OR 8.0 → modification; do not pool; report each stratum.
— If stratum-specific estimates are similar to each other AND to the crude estimate → neither (the third variable is irrelevant or independent).
— A >10% change between crude and adjusted estimates is the conventional cutoff for declaring confounding present.
— Heterogeneity of stratum-specific estimates is formally tested with Breslow-Day or an interaction term p-value.

— Compare distribution of potential confounders between exposed and unexposed groups (cohort) or between cases and controls (case-control).
— Significant imbalance signals a candidate confounder. Example: smokers overrepresented among coffee drinkers.
— (1) Associated with the exposure in the source population.
— (2) Independent risk factor for the outcome among the unexposed.
— (3) Not in the causal pathway (not a mediator).
— A variable failing any criterion is not a confounder and should not be adjusted for.
— Generate 2x2 tables within each stratum of the candidate variable.
— Compute stratum-specific RR/OR.
— Compute Mantel-Haenszel pooled adjusted estimate.
— Compare crude vs adjusted: % change = (crude − adjusted)/adjusted × 100.
— >10% shift → confounding confirmed.
— Apply Breslow-Day or Cochran's Q for homogeneity of stratum estimates.
— Or include an interaction term (exposure × modifier) in a regression model; significant p (commonly <0.05–0.10) supports modification.

— Output: aOR with 95% CI. Step 3 asks you to interpret these.
— Assumes proportional hazards over follow-up time.
— Compare model with and without the candidate confounder — if the exposure coefficient shifts >10%, confounding is present.
— Likelihood ratio test for nested models.
— Add a product (interaction) term (exposure × modifier) to the model.
— Significant interaction term → modification present → report stratum-specific estimates, not a single pooled estimate.
— Randomization (RCT) — gold standard; balances measured and unmeasured confounders on average.
— Restriction — limit study to one stratum (e.g., only non-smokers).
— Matching — in case-control, match cases to controls on the confounder.
— Stratification (Mantel-Haenszel).
— Multivariable regression.
— Propensity score methods (matching, weighting, stratification) — useful when many confounders exist.
— Instrumental variable analysis — addresses unmeasured confounding (e.g., Mendelian randomization).

— Mandatory adjustment or stratification. Reporting unadjusted estimates would be misleading.
— Example: smoking in any alcohol–cancer or coffee–cancer analysis.
— Adjust; report both crude and adjusted for transparency.
— Generally do not adjust; conserves degrees of freedom and avoids overfitting in small datasets.
— Step 1: Confirm it meets confounder criteria (associated with exposure, independent outcome risk factor, not a mediator).
— Step 2: Test for effect modification first (interaction term or Breslow-Day).
— Step 3a: If modification present → stop; report stratum-specific estimates. Do not pool.
— Step 3b: If no modification → compute pooled adjusted estimate (Mantel-Haenszel or regression).
— Step 4: Compare crude vs adjusted to quantify confounding.
— If variable Z lies on the causal pathway from exposure to outcome (e.g., LDL between statin and MI), adjusting for it biases the total effect toward the null ("overadjustment bias").
— Use DAGs to avoid.
— A collider is a variable caused by both the exposure and the outcome (or their causes). Adjusting for it creates a spurious association.

— Balances known and unknown confounders.
— Limitations: cost, ethics, external validity, non-adherence.
— On Step 3, randomization failure (small trial, broken allocation) reopens confounding risk.
— Study only one stratum (e.g., never-smokers) to eliminate smoking as confounder.
— Limitation: harms generalizability; reduces sample size.
— In case-control studies, match each case to one or more controls on the confounder.
— Requires matched analysis (McNemar test for 1:1, conditional logistic regression for variable ratios).
— Pitfall: Cannot study the matched variable as an exposure (overmatching).
— Calculate stratum-specific estimates, then pool with M-H weighting.
— Best when 1–2 confounders, categorical, with enough cases per stratum.
— Logistic, Cox, or linear regression handling many confounders simultaneously.
— Limited by sample size (rule of 10 events per variable) and model assumptions.
— Estimate probability of exposure given covariates; then match, weight, or stratify on this score.
— Approximates randomization on measured covariates.
— Use a variable associated with exposure but not with outcome (except via exposure) to estimate effect, bypassing unmeasured confounding.

— Formula: aOR_MH = Σ(aᵢdᵢ/nᵢ) / Σ(bᵢcᵢ/nᵢ), where each stratum i has its own 2x2 with cells a, b, c, d and stratum total n.
— Yields a single weighted estimate across strata.
— Similar weighted approach using risks per stratum.
— Crude OR (coffee → lung cancer) = 3.0.
— Stratify by smoking:
– Smokers: OR = 1.1
– Non-smokers: OR = 1.2
— Stratum estimates similar (homogeneous) → no effect modification.
— M-H adjusted OR ≈ 1.15 → confounded; smoking explains the apparent association.
— Crude OR = 2.0.
— Stratified:
– Non-smokers: OR = 1.1
– Smokers: OR = 8.5
— Stratum estimates differ markedly → effect modification.
— Do not pool; report each stratum separately. Clinically: counsel smokers more strongly against OCPs.
— Model: log(odds) = β₀ + β₁(Exposure) + β₂(Modifier) + β₃(Exposure × Modifier).
— Significant β₃ → effect modification on the multiplicative scale.
— Additive (public health relevance): does combined exposure exceed sum of individual effects?
— Multiplicative (regression default): does combined effect exceed product?
— A finding can be additive without being multiplicative — Step 3 generally tests multiplicative via OR/RR.

— High burden of comorbidity confounding: studies of any exposure in older adults must contend with age-correlated diseases (CKD, CHF, dementia, polypharmacy).
— Frailty is often an unmeasured confounder — frail patients are simultaneously less likely to receive an aggressive therapy AND more likely to die ("confounding by indication" or "healthy adherer effect").
— Age frequently behaves as an effect modifier: many drugs (e.g., antihypertensives, anticoagulants) have age-dependent risk-benefit profiles.
— Sicker patients receive a drug; they have worse outcomes from underlying disease, not the drug.
— Example: opioid use appears to increase mortality in cancer — but pain severity (a marker of advanced disease) confounds.
— Adjusted analyses often fail to fully correct this; propensity scores and active comparator designs help mitigate.
— Often act as effect modifiers for drug-outcome relationships because of altered pharmacokinetics.
— Stratified reporting is preferred to a single pooled estimate.
— Patients who initiate or adhere to preventive therapies (statins, screening) systematically differ in unmeasured health behaviors.
— Pure confounding that no measured covariate fully captures.

— Often excluded from RCTs → most data observational → confounding by indication is rampant.
— Example: SSRI use in pregnancy and birth defects — depression severity itself is a confounder.
— Maternal age, parity, prior pregnancy complications are recurring confounders.
— Age is almost always an effect modifier (drug pharmacokinetics, developmental physiology, immune maturity).
— Socioeconomic status, parental education, and breastfeeding are major confounders in pediatric observational research.
— Frequent effect modifier (e.g., aspirin for primary prevention — MI reduction in men, stroke reduction in women).
— Many drugs metabolized differently by sex (CYP differences) → consider modification.
— Often a proxy variable for SES, healthcare access, structural racism, and genetic ancestry — adjusting for race without unpacking these is problematic.
— Step 3 increasingly tests recognition that race-as-confounder analyses can obscure rather than reveal causal structure.
— One of the most pervasive confounders in epidemiology — correlated with diet, smoking, healthcare access, environmental exposures, education.
— Almost always must be considered in observational analyses of health outcomes.
— CYP2C19 status modifies clopidogrel effect.
— HLA-B*5701 modifies abacavir hypersensitivity risk.
— These are textbook effect modifications — report stratum-specific estimates.

— Coffee → pancreatic cancer (1981 study; later shown to be smoking confounding).
— HRT → reduced CHD (observational studies; reversed by WHI RCT — confounding by healthy user effect).
— Beta-carotene → lung cancer prevention (observational benefit; harm in RCT — confounding by diet/lifestyle).
— A true exposure-outcome link can be masked by a negative confounder.
— Example: a beneficial exposure correlated with a harmful one may appear neutral until adjustment.
— HRT recommendations in the 1990s based on confounded observational data led to widespread prescription, later reversed at significant clinical and financial cost.
— Reporting a pooled estimate when the effect is strongly heterogeneous misleads clinicians about subgroup risks and benefits.
— Example: pooling OCP–MI risk across smokers and non-smokers obscures the very high risk in smokers.
— Tamoxifen benefit is restricted to ER-positive breast cancer — ER status modifies effect; ignoring this would expose ER-negative patients to risk without benefit.
— Adjusting for a mediator biases toward the null and may make a real harmful exposure appear safe.
— Adjusting for a collider induces spurious associations (e.g., "obesity paradox" partly explained by collider bias when adjusting for disease status in studies of obesity and mortality).

— Confounders are unmeasured or unmeasurable in the dataset (e.g., frailty, health behaviors, genetic predisposition).
— Sample size is insufficient for the number of covariates needed (violates the "10 events per variable" rule, risking overfitting).
— Stratum-specific estimates differ markedly (effect modification) but the authors report only a pooled estimate.
— A mediator or collider is included in adjustment models (overadjustment or collider bias).
— Observational study with major unmeasured confounders contradicts a well-conducted RCT.
— Authors fail to test for interaction before pooling.
— Adjustment model includes variables on the causal pathway.
— Level 1 evidence (RCT/meta-analysis of RCTs) → confounding minimal; trust the effect estimate.
— Level 2–3 evidence (cohort, case-control) → confounding plausible; require evidence of robust adjustment and sensitivity analyses.
— Sensitivity analysis — what would an unmeasured confounder need to look like to nullify the result? (E-value, Rosenbaum bounds.)

— Distortion arising from how subjects are selected into the study or retained in follow-up.
— Subtypes: Berkson's bias (hospital-based controls overrepresent comorbidities), healthy worker effect, loss to follow-up differential by exposure or outcome.
— Distinction from confounding: selection bias is built into the sampling; confounding is a property of the population relationships among variables.
— Recall bias — cases remember exposures differently than controls (classic in case-control studies).
— Interviewer bias — knowledge of exposure or disease status influences data collection.
— Misclassification: Non-differential (random) misclassification biases toward the null; differential misclassification can bias either direction.
— Apparent survival benefit from earlier detection rather than true mortality reduction (lead-time).
— Screening preferentially detects slow-growing/indolent disease (length-time).
— Outcome precedes and causes the exposure rather than vice versa (especially in cross-sectional studies).
— Example: depression appears to cause low physical activity; in reality, immobility from disease may cause depression.
— Pharmacoepidemiology pitfall — time between cohort entry and treatment initiation is misclassified, biasing toward apparent treatment benefit.

— Wide confidence intervals or non-significant p-values reflect imprecision, not bias.
— Distinguished by sample size considerations; addressed by larger studies, not adjustment.
— Confounding ≠ chance — confounding produces a systematic distortion that persists with larger samples.
— A variable on the causal pathway between exposure and outcome (e.g., statin → LDL reduction → MI reduction; LDL is a mediator).
— Adjusting for a mediator answers a different question (direct effect) rather than the total effect.
— Often mistaken for a confounder by trainees.
— A variable caused by both exposure and outcome (or their causes). Adjusting opens a spurious "back-door" path.
— Example: in a study of smoking and depression in hospitalized patients, hospitalization itself is a collider — restricting to hospitalized patients can create artifactual associations.
— Potential outcomes (Rubin) — counterfactual reasoning about what would happen under different exposures.
— DAGs (Pearl) — graphical encoding of causal assumptions, identifying confounders to adjust for and colliders/mediators to leave alone.
— Inferring individual-level associations from group-level data. Distinct from confounding but often co-occurs.
— Direction of association reverses after stratification — an extreme form of confounding where stratum-specific and pooled estimates point opposite ways.
— Classic illustration: UC Berkeley admissions data, where overall sex bias reversed when stratified by department.

— Randomization is the definitive preventive measure for both measured and unmeasured confounding.
— Restriction to a narrow stratum (homogeneous population).
— Matching in case-control designs on key confounders, with appropriate matched analysis.
— Pre-specify confounders and effect modifiers based on prior literature and DAGs — avoid data-dredging.
— Measure plausible confounders comprehensively and accurately (standardized instruments, biomarkers when feasible).
— Collect time-varying covariates if exposure-confounder feedback is plausible.
— Pre-specified analysis plan including confounders and interaction tests.
— Use multiple methods (stratification + regression + propensity score) as sensitivity analyses.
— Report both crude and adjusted estimates transparently.
— STROBE guidelines for observational studies — require explicit reporting of confounders, adjustment methods, and effect modification testing.
— Disclose unmeasured confounders as limitations.
— Compute E-value: how strong an unmeasured confounder would need to be to nullify the observed effect — informs robustness.
— Meta-analyses of observational studies must address heterogeneity that may reflect effect modification.
— Subgroup analyses and meta-regression help identify modifiers across studies.

— Vary modeling assumptions (e.g., different sets of confounders).
— Use alternative methods (regression vs propensity score).
— Apply E-value or quantitative bias analysis for unmeasured confounding.
— Test multiple definitions of exposure and outcome.
— Choose an outcome unrelated to the exposure that should show no effect; if it does, residual confounding is present.
— Example: in pharmacoepi study of statin–dementia link, use an outcome like accidental injury as negative control.
— Compare new drug users to users of an alternative drug for the same indication (rather than non-users) to minimize confounding by indication.
— A standard rigor marker in modern pharmacoepidemiology.
— Similar logic — outcomes that should not be affected by the exposure serve as bias detectors.
— Always check: was the study an RCT or observational?
— If observational: which confounders were measured, which were not?
— Were interactions tested before pooling?
— Were sensitivity analyses performed?
— Does the effect size survive plausible unmeasured confounding (E-value)?
— When discussing observational findings (e.g., "studies suggest moderate alcohol reduces heart disease"), counsel patients that such findings may reflect confounding, not causation, and clinical recommendations require RCT evidence.

— HRT post-WHI debacle — observational evidence drove widespread prescription based on confounded cardiovascular benefit; subsequent RCT showed harm.
— Beta-carotene/vitamin E supplementation similarly reversed.
— Patient safety mandates skepticism of observational claims for interventions (vs etiologic exposures).
— Equity demands that subgroups (women, minorities, elderly) be analyzed and reported, not pooled away.
— Conversely, post-hoc data-dredging for subgroup effects (the "Texas sharpshooter") can manufacture spurious modifications and mislead clinical practice.
— Pre-specification is the ethical compromise.
— Subjects in observational studies must understand that their data contribute to inference that may be confounded; results should be communicated with appropriate uncertainty.
— A patient is discharged on a new medication based on a recent observational study showing benefit. Six months later, an RCT shows no benefit and potential harm. The clinician must reassess and potentially deprescribe, communicating clearly with the patient about why guidance changed — this is a real consequence of acting on confounded data and a tested communication competency.
— Cancer registries, vaccine adverse event reports, and pharmacovigilance databases generate observational data prone to confounding (especially confounding by indication and selection bias). Interpretation must be cautious.
— Adjusting for race without unpacking SES, structural factors, and access can perpetuate bias and obscure remediable causes of disparity — an ethical issue in modern epidemiology.

— Associated with exposure.
— Independent risk factor for outcome.
— Not on the causal pathway (not a mediator).
— Coffee & lung cancer (smoking confounder).
— OCP & MI (smoking effect modifier).
— HRT & CHD (observational vs WHI RCT; healthy user confounding).
— Birth order & Down syndrome (maternal age confounder).
— Vitamin/antioxidant supplements (observational benefit reversed by RCTs).

— "Crude RR was 3.0; after adjustment for age and smoking, adjusted RR was 1.1. Which best explains this?"
— Answer: confounding by the adjusted variables.
— "Among smokers, OR = 8.0; among non-smokers, OR = 1.1. Which describes the role of smoking?"
— Answer: effect modifier (interaction). Do not pool.
— "Investigators adjusted for LDL cholesterol when studying statin use and MI. The adjusted estimate showed no effect."
— Answer: overadjustment for a mediator; LDL is on the causal pathway.
— "An observational study showed HRT reduced CHD by 50%, but a subsequent RCT showed no benefit. Most likely explanation?"
— Answer: residual/unmeasured confounding (healthy user effect) in the observational study.
— "Overall, treatment A had higher mortality than B; within each disease severity stratum, treatment A had lower mortality. Explain."
— Answer: confounding by disease severity; sicker patients were preferentially assigned to A.
— "Which study design best controls for unmeasured confounders?"
— Answer: randomized controlled trial.
— Answer: multivariable regression or propensity score methods.
— "Which of the following is required for variable X to be a confounder?"
— Answer choice combining all three confounder criteria.
— "The interaction term (exposure × sex) had p = 0.002. What does this mean?"
— Answer: effect of exposure differs by sex → effect modification.

Confounding distorts a single estimate and must be fixed by design or adjustment; effect modification reveals true heterogeneity across subgroups and must be reported, not pooled.

