Biostatistics & Population Health
Crude vs adjusted analysis and confounder adjustment
— Associated with the exposure (in the source population)
— An independent risk factor for the outcome (not just through the exposure)
— Not on the causal pathway between exposure and outcome (that would be a mediator, not a confounder)
— Observational study (cohort, case-control, cross-sectional) reports a strong crude association
— Crude and adjusted estimates differ by ≥10% ("10% rule" — change-in-estimate criterion)
— The exposure groups differ at baseline in age, sex, comorbidity, smoking, SES, severity of illness
— Classic vignette: "Coffee drinkers had higher MI rates (crude RR 1.8), but after adjusting for smoking, RR fell to 1.0"
— Positive confounding — crude estimate is biased away from the null (overstates effect)
— Negative confounding — crude estimate is biased toward (or past) the null (understates or reverses effect)
Board pearl: If the crude OR is 2.5 and the age-adjusted OR is 1.1, age was a positive confounder — the apparent effect was largely explained by age differences between groups, not by the exposure itself. Always demand adjusted estimates before counseling a patient based on observational data.

— "A cohort study found X was associated with Y (RR 2.0). After adjustment for Z, RR was 1.0. What best explains this?"
— "Investigators report drug A reduces mortality in a retrospective chart review. Which threat to validity is most concerning?"
— "Patients on statins had lower dementia rates than non-users in an observational study." → healthy user bias / confounding by indication
— Observational design (no randomization mentioned)
— Baseline tables showing imbalance (e.g., treated group older, sicker, more comorbidities)
— Mention of a covariate that is biologically linked to both exposure and outcome
— Effect estimate changes substantially after multivariable modeling
— Coffee → CAD, confounded by smoking
— Alcohol → lung cancer, confounded by smoking
— HRT → CHD (observational benefit), confounded by SES / healthy-user behavior (WHI RCT reversed this)
— Birth order → Down syndrome, confounded by maternal age
— Yellow fingers → lung cancer, confounded by smoking
— NSAID use → GI bleed risk in elderly, confounded by age and comorbidity
— Was randomization performed? (If yes, confounding is minimized at baseline)
— Were baseline characteristics balanced or statistically compared?
— What variables entered the multivariable model, and why?
— Were unmeasured confounders acknowledged in limitations?
Key distinction: Confounding is a systematic error from a third variable; selection bias arises from how subjects entered the study; information bias arises from how data were measured. Step 3 stems often offer all three as distractors — anchor on the mechanism described.

| • Confounding has no "physical exam," but the analogous skill is inspecting the study's baseline characteristics table (Table 1). | ||
| • What to scan for on Table 1: | ||
| — Age, sex, race/ethnicity distributions across exposure groups | ||
| — Comorbidity burden (Charlson index, diabetes, CKD, HF) | ||
| — Medication use that overlaps the outcome pathway | ||
| — Behavioral factors: smoking, alcohol, exercise, BMI | ||
| — Socioeconomic markers: income, education, insurance | ||
| — Severity-of-illness markers (eGFR, EF, NYHA class) | ||
| • Red flags suggesting residual or unmeasured confounding: | ||
| — Baseline differences with p < 0.05 across multiple variables | ||
| — Treated group dramatically healthier (think: statin observational studies — healthy adherer effect) | ||
| — Treated group dramatically sicker (think: ICU interventions given to the most ill — confounding by indication) | ||
| — Important known risk factor for the outcome is not listed or not adjusted for | ||
| • Quantitative "exam" maneuvers: | ||
| — Apply the 10% change-in-estimate rule: | crude − adjusted | / crude × 100; ≥10% suggests meaningful confounding |
| — Compute a standardized mean difference (SMD) — values >0.1 between groups suggest residual imbalance even when p-values look "non-significant" | ||
| — Look for E-value in modern papers: quantifies how strong an unmeasured confounder would need to be to nullify the result | ||
| • Hemodynamic analogy: Just as you assess perfusion before treating shock, assess covariate balance before trusting an effect estimate. | ||
| Board pearl: A p-value >0.05 on a baseline characteristic does not mean groups are balanced — it may reflect low power. Use SMD or clinical judgment, not p-values, to judge baseline comparability. This is a favorite Step 3 trap. |

— Is it associated with the exposure?
— Is it an independent risk factor for the outcome?
— Is it on the causal pathway? (If yes → it's a mediator, do not adjust as a confounder)
— Stratify by the suspected confounder (e.g., smokers vs non-smokers)
— Compute stratum-specific effect estimates
— If stratum estimates are similar to each other but different from the crude, that variable is a confounder
— If stratum estimates differ from each other, that variable is an effect modifier (interaction), not just a confounder — and you should report stratum-specific results, not a single adjusted estimate
— Logistic regression for binary outcomes (yields adjusted OR)
— Cox proportional hazards for time-to-event (adjusted HR)
— Linear regression for continuous outcomes (adjusted β)
— Include biologically plausible confounders identified a priori (preferred over stepwise selection)
— Propensity score matching, weighting, or stratification
— Instrumental variable analysis when strong unmeasured confounding is suspected
— E-value reporting
Key distinction: Confounder → adjust. Mediator → do not adjust (adjusting blocks the very pathway you're trying to measure and biases the estimate toward the null). Effect modifier → stratify and report separately. Misclassifying a mediator as a confounder is a high-yield Step 3 error: e.g., adjusting for LDL when studying statins and MI removes the drug's true effect because LDL reduction is the mechanism.

— Randomization — gold standard; balances measured and unmeasured confounders on average. RCT is the only design that does this.
— Restriction — enroll only one stratum (e.g., only nonsmokers) — eliminates that confounder but limits generalizability
— Matching (case-control or cohort) — pair subjects on confounders (age, sex); requires matched analysis (conditional logistic regression, McNemar's test)
— Stratification / Mantel-Haenszel pooled estimate
— Multivariable regression (most common)
— Standardization (direct or indirect — used in epidemiology for rates across populations of differing age structure)
— Propensity score methods — model the probability of treatment given covariates, then match/weight/stratify on it; useful when outcomes are rare but covariates are many
— Inverse probability of treatment weighting (IPTW) — creates a pseudo-population balanced on measured confounders
— Instrumental variables — leverage a variable affecting exposure but not outcome directly (e.g., distance to hospital as IV for treatment receipt)
— Difference-in-differences / regression discontinuity — quasi-experimental designs
— Unmeasured confounders
— Imprecisely measured confounders (measurement error in the covariate)
— Misspecified functional form (e.g., adjusting for age as a binary instead of continuous)
Board pearl: Randomization is the only method that controls for unmeasured confounding. Every observational adjustment technique — no matter how sophisticated (propensity scores, IPTW, machine learning) — can only address measured confounders. This is why an RCT trumps a propensity-matched cohort on the evidence hierarchy.

— 1. Chance (random error) — addressed by p-values, CIs, sample size
— 2. Bias (systematic error) — selection bias, information/measurement bias
— 3. Confounding — third-variable distortion
— 4. Reverse causation — outcome actually preceded/caused exposure
— 5. External validity / generalizability
— Confounding = nuisance to remove; adjust for it, report a single adjusted estimate
— Effect modification (interaction) = real biological/clinical phenomenon; report stratum-specific estimates; do not "average it out"
— Example: Aspirin reduces MI more in men than women → sex is an effect modifier, not a confounder
— Draw a DAG (directed acyclic graph). Arrow from exposure to candidate variable to outcome = mediator. Arrow from candidate variable to both exposure and outcome = confounder.
— Confounding: groups differ in a third variable within the study population
— Selection bias: the study population itself was assembled in a way that distorts the association (e.g., Berkson's bias in hospital-based case-control)
— Ask whether the design was randomized
— Inspect Table 1 for balance
— Compare crude vs adjusted estimates
— Evaluate whether key biological confounders were included
— Consider residual/unmeasured confounding before changing practice
Step 3 management: If a vignette presents observational data alone — even with adjustment — and asks whether to recommend an intervention, the safest answer is usually to await RCT data or discuss uncertainty in shared decision-making, not to act on observational adjusted estimates.

— Randomization is the unrivaled first-line tool — balances all known and unknown confounders in expectation
— Blocked or stratified randomization improves balance in small trials by forcing equal allocation within strata (e.g., age, site)
— Allocation concealment prevents post-randomization confounding by selection
— Multivariable regression is the workhorse — logistic, Cox, or linear depending on outcome
— Pre-specify confounders based on subject-matter knowledge and DAGs, not on p-value-driven stepwise selection (which biases SEs and inflates type I error)
— Include the "minimally sufficient adjustment set" — the smallest group of variables that blocks all backdoor paths
— Propensity score methods when treatment is non-random and many covariates exist
— Use propensity score matching for balance assessment; IPTW to preserve sample size
— Doubly robust estimators combine outcome regression with propensity weighting — unbiased if either model is correct
— Overadjustment — adjusting for mediators or colliders introduces bias rather than removing it
— Collider bias — conditioning on a variable affected by both exposure and outcome opens a spurious path
— Sparse-data bias — too many covariates relative to events (rule of thumb: ≥10 events per variable in logistic/Cox models)
— Model misspecification — wrong functional form, missing interactions
Board pearl: Adjusting for a collider is worse than not adjusting at all. Classic example: in a hospital-based study, adjusting for "hospitalization" when studying two diseases that both cause admission creates a spurious negative association (Berkson's paradox). Always draw the DAG before choosing covariates.

— Each row = one covariate
— Adjusted estimate (OR, HR, β) shown with 95% CI and p-value
— Interpretation: "Holding all other variables constant, a one-unit change in X is associated with [estimate] change in outcome"
— Coefficients in log-odds; exponentiate to get OR
— OR >1 = increased odds; <1 = decreased odds; CI crossing 1 = not significant
— Exponentiated coefficient = HR
— Assumes hazards are proportional over time (test with Schoenfeld residuals)
— HR is not a risk ratio — it's an instantaneous rate ratio
— β = expected change in continuous outcome per unit increase in predictor, adjusted for others
— R² describes variance explained; doesn't validate causal inference
— Add interaction term (exposure × modifier) to the model
— Significant interaction term (p<0.05 or pre-specified threshold) → report stratified estimates
— STROBE for observational studies, CONSORT for RCTs
— Always report both crude and adjusted estimates so readers can assess the magnitude of confounding
— Provide CIs, not just p-values
Key distinction: A statistically significant adjusted OR of 1.05 in a huge dataset may be clinically meaningless, while a non-significant adjusted OR of 2.0 in a small study may still be clinically important — always interpret effect size and CI width, not just p-values. Step 3 loves to give you a tiny but "significant" finding and ask whether to change management (the answer is usually no).

— Patients prescribed a drug are systematically different from those not prescribed it
— E.g., antipsychotics in dementia: treated patients are sicker → higher mortality observed even if drug is neutral
— E.g., ICU vasopressors: treated patients are in shock → "vasopressors associated with death" is confounded by severity
— Patients who adhere to preventive therapies (statins, HRT, vitamins) are healthier overall — exercise, diet, screening, SES
— Observational benefits often vanish in RCTs (WHI for HRT; vitamin E for cardiovascular events)
— Underrecognized; not captured by age or comorbidity counts alone
— Drives both treatment decisions and outcomes
— Use frailty indices (Fried, Rockwood) when available
— Reduced eGFR or elevated bilirubin both predict drug exposure (dose-reduced or avoided) and adverse outcomes
— Studies of nephrotoxic drugs (contrast, NSAIDs, aminoglycosides) must adjust for baseline renal function and competing risks
— Death from other causes can preclude the outcome of interest
— Standard Cox models overestimate cumulative incidence; use Fine-Gray subdistribution hazard models
Step 3 management: When a vignette describes an observational study of elderly patients showing a drug "increases mortality," strongly suspect confounding by indication. The correct answer often emphasizes that an RCT is needed before changing practice — particularly relevant when stems ask about deprescribing antipsychotics, anticoagulants, or statins in nursing home residents.

— RCTs in pregnancy are ethically constrained → most data are observational
— Indication bias: pregnant women on antiepileptics, antidepressants, or antihypertensives differ from untreated peers in disease severity
— Example: SSRIs and birth defects — early observational signals were heavily confounded by maternal depression severity; sibling-controlled designs largely attenuated risk
— Live-birth bias: restricting analysis to live births can introduce collider bias when exposure affects fetal loss
— Growth, development, and maturation are time-varying confounders
— Socioeconomic and parental factors confound nearly every behavioral or environmental exposure
— Vaccine safety studies must address healthy-vaccinee bias (parents who vaccinate also seek more care)
— Often a proxy for unmeasured social determinants (access, structural racism, income, environment)
— Adjusting "for race" without addressing the underlying mechanisms can mask, not explain, disparities
— Increasingly, journals require justification for inclusion of race as a covariate
— RCT generalizability suffers when women, minorities, elderly, or pregnant patients are excluded
— Adjusted estimates from such trials may not apply to excluded groups — a confounder-adjacent generalizability issue
Board pearl: Sibling-controlled designs are a powerful tool in pregnancy and pediatric pharmacoepidemiology — they implicitly adjust for shared family-level confounders (genetics, SES, parenting). If a stem mentions a sibling-controlled study showing attenuated risk, the original association was likely confounded by family-level factors.

— Adjusting for a mediator removes the very effect you want to measure
— Example: studying smoking → lung cancer, adjusting for "chronic cough" or "lung function" — both are downstream of smoking and partly cause the cancer
— Result: attenuated or null estimate, false reassurance
— A collider is a variable caused by both exposure and outcome
— Conditioning on it (by stratifying, adjusting, or restricting) opens a non-causal path
— Example: in hospitalized patients, smoking appears protective against COVID severity — because both smoking and severe COVID independently increase hospitalization probability
— Interpreting every coefficient in a multivariable model as if each represents a causal effect
— Only the exposure of interest's adjusted coefficient is interpretable causally; covariate coefficients are confounded by other variables in the model
— Always present in observational studies; quantify with E-value
— Increases with measurement error in covariates
— Too many covariates relative to events inflates effect estimates
— Rule of thumb: ≥10 outcome events per covariate
— When confounders change over time and are also affected by prior treatment
— Requires marginal structural models or g-methods; standard regression is biased
Key distinction: Underadjustment → confounding remains; overadjustment → introduces new bias. The correct adjustment set is neither maximal nor minimal — it is the set that blocks all confounding paths without opening collider or mediator paths. A causal DAG is the only principled way to choose it.

— Substantial residual confounding suspected (low E-value, unmeasured confounders)
— Confounding by indication cannot be ruled out
— Treatment decisions involve significant cost, risk, or population-level impact
— Conflicting observational results across studies
— Mechanism is biologically plausible but effect size is small to moderate (high signal-to-noise demands)
— Effect size is enormous (e.g., smoking → lung cancer RR ~20); no plausible confounder could explain it
— RCT is unethical (randomizing harmful exposures) or infeasible (rare outcomes, long latency)
— Bradford Hill criteria are strongly met: strength, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, analogy
CCS pearl: On a CCS-style question asking how to act on a guideline derived from observational data: order the indicated shared decision-making conversation with the patient, document uncertainty, and avoid committing to therapy as if RCT evidence existed. Premature adoption of observational findings (vitamin E, HRT, beta-carotene) has historically caused population-level harm.

— Confounding: a third variable distorts the exposure-outcome association within the study sample
— Selection bias: the way subjects were recruited or retained distorts the association
— Example: studying mortality only among hospital survivors (survivor bias) is selection, not confounding
— Information bias: systematic error in measuring exposure, outcome, or covariates
— Recall bias — cases remember exposures differently from controls (case-control studies)
— Interviewer bias — knowledge of group assignment influences data collection
— Misclassification — non-differential (toward null) vs differential (any direction)
— Confounding: groups differ in a third variable; adjust and report a single estimate
— Effect modification: the true effect differs across strata; report stratified estimates
— A variable can be both a confounder and an effect modifier
— Reverse causation: the outcome causes the exposure (e.g., low cholesterol → cancer may reflect undiagnosed cancer lowering cholesterol)
— Confounding involves a third variable; reverse causation involves the temporal direction between exposure and outcome
— Chance: random variation; addressed by p-values and CIs
— Confounding: systematic; not reduced by larger samples (in fact, large samples make confounded estimates more precise but no less biased)
Board pearl: Increasing sample size shrinks the CI around the wrong answer in a confounded study — it does not fix bias. This counterintuitive point is heavily tested: "The study has 50,000 patients and a tight CI — should we trust the conclusion?" If the design is observational and confounding is unaddressed, the answer is no.

— Direction of association reverses when data are aggregated vs stratified
— Classic example: UC Berkeley admissions appeared to favor men overall but favored women within each department — driven by women applying to more competitive departments
— Department was a confounder; aggregating data masked it
— Apparent survival improvement from earlier diagnosis without true mortality benefit
— Common in screening studies
— Screening preferentially detects slow-growing (more indolent) disease, inflating apparent survival
— A period during follow-up when the outcome cannot occur is misclassified as "exposed" time
— Common in pharmacoepidemiology when "ever-users" of a drug are compared to "never-users"; the time before the first prescription is immortal time
— Employed populations are healthier than the general population
— Occupational studies often need internal comparison groups
— Positive studies more likely to be published, inflating meta-analytic estimates
— Funnel plot asymmetry; Egger's test
— Inferring individual-level associations from group-level data
— Subjects change behavior because they know they are being observed
Key distinction: Many of these are not strictly confounding but are frequently offered as distractors alongside confounding in Step 3 stems. Anchor on the mechanism described in the vignette: a third variable → confounding; how subjects were selected → selection bias; how data were collected → information bias.

— STROBE — observational studies (cohort, case-control, cross-sectional)
— CONSORT — randomized trials
— PRISMA — systematic reviews and meta-analyses
— TRIPOD — prediction model studies
— STARD — diagnostic accuracy studies
— Pre-specified list of confounders with rationale (DAG-based ideal)
— Both crude and adjusted estimates with 95% CIs
— Sensitivity analyses (alternative model specifications, propensity scores, E-values)
— Explicit acknowledgment of unmeasured confounding
— Stratified estimates when effect modification is suspected
— Default skepticism toward observational claims of treatment benefit
— Default trust (with appropriate context) in well-conducted RCTs
— Awareness that guidelines synthesize evidence quality (GRADE framework: high → low → very low certainty)
— Use of point-of-care tools (UpToDate, USPSTF, society guidelines) that incorporate evidence grading
— Counsel patients in proportion to evidence strength
— "Studies suggest" vs "trials have shown" — language matters
— Periodic critical appraisal training
— Journal clubs structured around study validity, not just results
Step 3 management: When a guideline cites observational evidence (GRADE low/very low), the clinician's long-term plan should emphasize shared decision-making and individualized risk-benefit assessment, not categorical adoption. Step 3 stems reward physicians who recognize evidence quality gradients rather than treating all "published findings" as equivalent.

— Subscribe to systematic review updates (Cochrane, AHRQ)
— Watch for trial sequential analyses that update meta-analytic conclusions
— Reassess practice when major RCTs overturn observational findings (HRT, vitamin E, beta-carotene, intensive glucose control in critically ill patients)
— Was the study randomized? If not, what adjustment methods were used?
— Were both crude and adjusted estimates reported? How much did they differ?
— Was an E-value or sensitivity analysis reported?
— Were known biological confounders included?
— Was effect modification explored?
— Always ask: "What would the unmeasured confounders need to look like to nullify this finding?"
— Treat baseline tables as physical exams of a study
— Use DAGs before choosing covariates in any analysis you conduct
— 10% change-in-estimate rule for confounding
— SMD >0.1 = meaningful imbalance
— ≥10 events per variable in regression
— E-value interpretation: an E-value of 2 means an unmeasured confounder with RR ≥2 with both exposure and outcome could explain away the result
— Read methods before results
— Read Table 1 before Table 2
— Trust effect sizes and CIs, not p-values alone
Board pearl: When a journal article reports only adjusted estimates and not crude estimates, this omission itself is a red flag — without the comparison, readers cannot judge the magnitude of confounding, and reviewers should request both. STROBE explicitly requires reporting both.

— Premature adoption of observational findings has caused population harm: HRT for primary CHD prevention (WHI reversed it, with increased breast cancer, stroke, VTE), beta-carotene for lung cancer prevention (CARET trial showed increased cancer in smokers), antiarrhythmic drugs post-MI (CAST trial showed increased mortality)
— Physicians have an ethical obligation to communicate evidence quality during informed consent
— Recommending an intervention based solely on observational data without disclosing the evidence grade may violate informed consent principles
— Patients should know whether a recommendation rests on RCT data or weaker evidence
— Equipoise is required to ethically randomize patients in an RCT
— When equipoise is lost (one arm clearly superior), trials must be stopped (DSMB role)
— Observational research still requires IRB approval, informed consent (or waiver with justification), and HIPAA-compliant data handling
— Adjusting "for race" without addressing structural determinants can entrench disparities
— Race-based clinical algorithms (eGFR, PFTs, ASCVD risk) are under active revision because they conflated social and biological variables
— When evidence is observational and uncertain, document the reasoning and shared decision-making in the discharge summary or handoff so the next clinician understands why a therapy was started or withheld — this prevents downstream errors and supports continuity
— Researchers must report conflicts of interest and pre-register trials (ClinicalTrials.gov) — failure to do so undermines both internal and external validity
— QI initiatives often use observational pre/post designs vulnerable to confounding by secular trends — interrupted time series or stepped-wedge designs strengthen inference
Step 3 management: When initiating or deprescribing therapy based on weak evidence, document the evidence grade, the shared decision-making conversation, and the monitoring plan — this protects both the patient and the clinician medico-legally.

Board pearl: The single most tested concept is the distinction between confounder (adjust), mediator (don't adjust), collider (don't adjust), and effect modifier (stratify). Master this fourfold distinction and most biostat vignettes on confounding become straightforward.

— Stem: "A cohort study found drinking coffee was associated with MI (RR 1.8). After adjusting for smoking, RR = 1.0. What best explains this?"
— Answer: Confounding by smoking
— Distractors: effect modification, selection bias, recall bias, chance
— Stem: "Observational data suggest postmenopausal HRT prevents CHD, but an RCT found increased CHD events. What explains the discrepancy?"
— Answer: Healthy user / confounding by SES and lifestyle
— Stem: "Among ICU patients, those receiving vasopressors had higher mortality."
— Answer: Confounding by indication / severity of illness
— Stem: "In a study of statins and MI, adjustment for LDL cholesterol attenuated the apparent benefit. Why?"
— Answer: LDL is a mediator; adjusting for it removes the true effect
— Stem: "Aspirin reduced MI by 30% in men but 5% in women (p-interaction = 0.01)."
— Answer: Effect modification by sex; report stratified estimates
— Stem: "Hospital A had higher overall mortality than Hospital B, but lower mortality in every severity stratum."
— Answer: Confounding by case mix / Simpson's paradox
— Stem: "Treated patients were older, had more diabetes, and worse renal function. Crude analysis showed treatment associated with mortality."
— Answer: Baseline confounding; need multivariable adjustment
— Stem: "Why is randomization the most effective method to control confounding?"
— Answer: Balances measured and unmeasured confounders in expectation
Step 3 management: When two answer choices both seem plausible (e.g., "selection bias" and "confounding"), re-read the stem for the mechanism — is the issue a third variable (confounding) or how subjects were enrolled (selection)? The mechanism determines the answer, not the magnitude of the bias.

Confounding distorts a crude association via a third variable that is linked to both exposure and outcome but not on the causal pathway, and is best prevented by randomization or controlled through stratification, multivariable regression, or propensity methods — recognizing that adjustment can only address measured confounders.
Board pearl: Bigger sample sizes shrink confidence intervals but never fix confounding — precision is not accuracy. When in doubt on a Step 3 stem about observational evidence, the safest answer favors awaiting RCT data, pursuing shared decision-making, or explicitly acknowledging residual confounding rather than acting as if adjustment fully removed bias.

