Biostatistics & Population Health
Cohort study design and interpretation
— Prospective cohort: exposure ascertained now, outcomes accrued in the future (e.g., Framingham, Nurses' Health Study).
— Retrospective (historical) cohort: investigator uses existing records to define exposure in the past, then follows forward to a present-day outcome (e.g., EHR-based study of metformin users vs nonusers from 2010 → 2023 stroke incidence).
— Establishes temporality (exposure before outcome) → supports causal inference more than case-control.
— Can study multiple outcomes from a single exposure.
— Good for common outcomes and rare exposures (e.g., occupational asbestos cohort).
— Inefficient and expensive for rare outcomes (would need huge cohort or long follow-up).
— Vulnerable to loss to follow-up, confounding, and healthy-worker effect.
— Cannot prove causation alone — still observational.
Board pearl: If the question gives you incidence in two groups and asks for the most appropriate measure of association, the answer is relative risk (RR) — the signature statistic of cohort studies. Odds ratios belong to case-control. Recognizing the design first prevents the most common arithmetic trap on biostatistics items.

— "Researchers enrolled 5,000 smokers and 5,000 nonsmokers and followed them for 10 years…"
— "Incidence of myocardial infarction was measured per 1,000 person-years."
— "Using insurance claims from 2015, investigators identified patients prescribed drug X and compared them to matched controls without the prescription, tracking hospitalizations through 2022."
— Defined exposure groups at baseline (exposed vs unexposed) — this is the cohort's anchor.
— Disease-free at entry — outcome has not yet occurred; if subjects already have the outcome, it is not a true incident cohort.
— Follow-up duration with a clear time horizon.
— Counts of new (incident) cases in each group.
— Case-control: starts with disease status (cases vs controls), looks backward for exposure → gives odds ratio.
— Cross-sectional: measures exposure and outcome simultaneously → gives prevalence, not incidence.
— RCT: investigator assigns exposure (intervention) rather than observing it; randomization eliminates confounding by indication.
— If the protocol was written before outcomes occurred → prospective.
— If a database is mined where outcomes already exist in the record but exposure was defined before outcome in calendar time → retrospective cohort (still forward in logic).
Key distinction: "Retrospective cohort" ≠ "case-control." Retrospective cohort still groups by exposure and measures incidence; case-control groups by outcome and measures odds. Misclassifying the design is the single highest-yield error on Step 3 biostats vignettes and changes which statistic is even computable.

— Rows = exposure status (exposed / unexposed) — fixed by design.
— Columns = outcome (disease / no disease) — measured during follow-up.
— Cells: a = exposed-diseased, b = exposed-healthy, c = unexposed-diseased, d = unexposed-healthy.
— Exposed: a / (a+b)
— Unexposed: c / (c+d)
— Open (dynamic) cohort: members enter and leave (e.g., a county registry). Use person-time.
— Closed (fixed) cohort: enrollment closes at baseline; everyone followed from t=0 (e.g., a birth cohort).
— Investigators may match unexposed to exposed on age, sex, comorbidity to reduce confounding, not to enable an odds ratio.
— Matched cohorts still yield RR/HR, not OR.
Step 3 management: When interpreting a cohort result clinically (e.g., counseling a patient on statin benefit), translate RR into absolute risk reduction and number needed to treat before discussing with the patient — RR alone overstates impact when baseline risk is low. This translation is a recurring Step 3 communication-skills item.

— RR = 1 → no association.
— RR > 1 → exposure increases risk (e.g., smoking and lung cancer RR ≈ 20).
— RR < 1 → exposure is protective (e.g., statin and MI).
— RR is the primary effect measure of cohort studies.
— Same units as risk (per person over time).
— Drives clinical decisions because it accounts for baseline risk.
— Among exposed who developed disease, the % attributable to the exposure.
— If 95% CI for RR/HR crosses 1.0, the result is not statistically significant at α = 0.05.
— Narrow CI → precise estimate (usually larger sample).
Board pearl: A Step 3 stem may report "HR 0.78 (95% CI 0.65–0.94) for major adverse cardiovascular events with SGLT2 inhibitor use." Recognize this as a cohort/RCT-derived adjusted estimate, statistically significant (CI excludes 1), and translate to roughly a 22% relative risk reduction — then ask for absolute numbers before counseling.

— Design-stage controls: restriction, matching, randomization (not available in observational cohorts).
— Analysis-stage controls: stratification, multivariable regression, propensity score matching/weighting, instrumental variables.
— Kaplan–Meier curve: plots cumulative survival/event-free probability over time; step-downs at each event. Censoring marks denote loss to follow-up or study end.
— Log-rank test: compares two or more KM curves (p-value for whether curves differ).
— Cox proportional hazards model: yields adjusted HRs; assumes hazards are proportional over time (parallel-ish curves).
Key distinction: Adjustment can address measured confounders only. Residual confounding from unmeasured variables (e.g., frailty, socioeconomic status) is the Achilles heel of observational cohorts — which is exactly why RCTs remain the gold standard when feasible.

— Healthy-worker effect: employed cohorts appear healthier than the general population because the unwell exit the workforce. Mitigate by comparing within-industry exposure gradients.
— Loss to follow-up bias: problematic if dropout is differential between groups (>20% loss is a red flag; >10% warrants sensitivity analysis).
— Misclassification of exposure: if nondifferential (random), biases RR toward the null (1.0); if differential, can bias either direction.
— Recall bias: less of an issue in prospective cohorts (exposure recorded before outcome) — a major advantage over case-control.
— Surveillance/detection bias: exposed group monitored more closely → more outcomes found (e.g., HRT users get more pelvic exams, more incidental cancers detected).
Step 3 management: When a stem reports a surprising protective association (e.g., "vitamin E users had lower mortality"), reflexively consider healthy-user bias and confounding by indication before accepting causality. The board answer is usually "residual confounding" rather than a true biological effect — mirroring the real-world failure of observational findings to replicate in RCTs (HRT, β-carotene, vitamin E).

— Drivers: effect size, baseline incidence, sample size, α level, and follow-up duration.
— In cohorts, rare outcomes demand huge samples or long follow-up — a structural weakness.
— Systematic review/meta-analysis of RCTs > single RCT > prospective cohort > retrospective cohort > case-control > cross-sectional > case series > expert opinion.
— Well-conducted cohort > poorly conducted RCT in practice, but board answers follow the canonical hierarchy.
Board pearl: If a cohort finds no association but enrolled only 200 patients with a 1% outcome rate, the answer is inadequate power (type II error), not "no true effect." Always check sample size against outcome frequency before accepting a null result.

— Choose cohort when exposure is rare and outcome is common (e.g., specific occupational exposure → cardiovascular disease).
— Choose case-control when outcome is rare and exposure is common (e.g., pancreatic cancer → multiple lifestyle exposures).
— Cohort yields RR and incidence; case-control yields OR only (OR approximates RR when outcome <10%).
— Cross-sectional snapshots cannot establish temporality or incidence — only prevalence and association. Useful for hypothesis generation and disease burden estimates.
— RCTs eliminate confounding by indication via randomization and are required for definitive efficacy claims (FDA approval).
— Cohorts are essential when randomization is unethical (e.g., assigning smoking, asbestos exposure) or infeasible (rare exposures, long latency, real-world effectiveness).
Key distinction: When a Step 3 question presents a drug safety signal post-marketing, the appropriate design is usually a retrospective cohort using claims/EHR data — because the outcome (adverse event) is rare in trials but exposure (prescription) is well-documented. This mirrors how FDA's Sentinel system actually functions.

— Death from any cause may preempt the outcome of interest (e.g., dementia incidence in 85-year-olds is censored by mortality).
— Use cumulative incidence function or Fine-Gray subdistribution hazards instead of standard Kaplan–Meier, which overestimates event probability.
— Many landmark cohorts (Framingham, ARIC) under-enrolled adults >80 — extrapolation requires caution.
— Guideline thresholds (e.g., LDL targets, BP targets) derived from middle-aged cohorts may not apply to nursing-home residents.
Step 3 management: When counseling an 82-year-old about a preventive therapy whose evidence comes from a cohort of 50–70-year-olds, communicate uncertainty of benefit and weigh competing mortality risk. Cite life-expectancy considerations and shared decision-making — this is a high-yield Step 3 patient-communication theme tied directly to cohort generalizability.

— Examples: Slone Birth Defects Study, Medicaid Analytic eXtract pregnancy cohorts.
— Key bias: confounding by indication — pregnant patients prescribed antidepressants differ systematically from unexposed pregnancies; use sibling-comparison designs to control for shared family/genetic factors.
— Healthy-mother effect: women who carry to term are healthier than those who miscarry → live-birth-only cohorts underestimate teratogenicity.
— Birth cohorts (e.g., ALSPAC, Generation R) link prenatal exposures to childhood outcomes — ideal for developmental epidemiology.
— Issues: long follow-up, loss to follow-up as children age, parental reporting bias.
— Pediatric assent + parental permission required; reconsent at age of majority for biobanked data.
— Pregnant patients require additional IRB protections (45 CFR 46 Subpart B).
Board pearl: When a Step 3 question asks about the best design to assess teratogenic risk of a newly marketed medication, the answer is a prospective pregnancy exposure cohort or pregnancy registry — RCTs are unethical, and case-control of birth defects is hampered by recall bias. Pregnancy registries (e.g., for antiepileptics, biologics) are explicit FDA post-marketing requirements.

Key distinction: A statistically significant finding (p < 0.05, CI excluding null) is not the same as a clinically significant finding. A cohort of 200,000 patients can detect a 1.02 HR with tight CIs — trivial for an individual patient. Step 3 expects you to translate statistical to clinical significance during counseling.

— Cohort/observational signal → mechanistic plausibility check → RCT for confirmation → meta-analysis → guideline change.
— Example: SGLT2 inhibitors — observational signal for renal/cardiac benefit → confirmed in EMPA-REG, DAPA-HF, CREDENCE → now first-line in HFrEF, CKD, T2DM with CVD.
— RCT is unethical or infeasible (smoking → cancer, parachutes).
— Effect size is enormous (RR > 5) with consistent replication, dose-response, biological plausibility.
— Multiple high-quality cohorts converge (Bradford Hill consistency).
— Modest effect size (RR 1.1–1.5), inconsistent across studies, biologically ambiguous, plausible confounding by indication.
— FDA Sentinel and post-marketing surveillance cohorts trigger label changes, black-box warnings (e.g., fluoroquinolones and aortic dissection signal from retrospective cohorts).
— REMS programs may follow.
Step 3 management: A new retrospective cohort suggests a 30% increase in tendon rupture with a commonly prescribed drug. Before changing your prescribing pattern, look for: (1) replication in independent cohorts, (2) RCT data or meta-analysis, (3) FDA communication. Acting on a single retrospective signal risks more harm than benefit (e.g., reflex deprescribing of beneficial medication).

— Starts with outcome (cases with disease, controls without).
— Looks backward at exposure.
— Yields odds ratio only; cannot calculate incidence or RR directly.
— Best for rare outcomes (cancer, rare birth defects) and multiple exposures.
— Vulnerable to recall bias and selection of controls issues.
— Measures exposure and outcome at the same point in time.
— Yields prevalence and prevalence ratio/odds ratio.
— Cannot establish temporality → cannot determine causation.
— Best for disease burden, screening test characteristics, surveys (NHANES).
— Unit of analysis is the group/population, not the individual.
— Susceptible to ecologic fallacy.
— Useful for hypothesis generation (e.g., international cancer rate comparisons).
— Descriptive; no comparison group; lowest evidence rung.
— Hypothesis-generating only.
Board pearl: When the stem says "investigators identified 200 patients with pancreatic cancer and 400 matched controls without cancer, then asked about prior coffee consumption," this is case-control, not cohort. The give-away is starting with cases. The appropriate statistic is the odds ratio, and the dominant bias to suspect is recall bias.

— Investigator assigns exposure via randomization → eliminates confounding (measured and unmeasured) in expectation.
— Gold standard for efficacy claims.
— CONSORT reporting guideline.
— Subtypes: parallel-group, crossover, cluster, factorial, adaptive, pragmatic.
— Quantitative synthesis of multiple studies; can pool cohort studies but heterogeneity is high.
— PRISMA reporting guideline.
— Forest plot: visual display of effect estimates and CIs; diamond at bottom = pooled estimate.
— I² statistic: heterogeneity (>50% = substantial).
— Funnel plot: asymmetry suggests publication bias.
Key distinction: A prospective cohort and an RCT look superficially similar (both follow groups forward in time), but the assignment mechanism distinguishes them. Cohort = investigator observes; RCT = investigator assigns. This single feature underlies the entire evidence hierarchy and is the most common Step 3 trap when a stem buries the design in dense prose.

— Always convert RR/HR into absolute risk and NNT/NNH for patient counseling.
— Use natural frequencies ("3 out of 100 over 10 years") rather than percentages or relative terms — better health-literacy outcomes per AHRQ.
— Decision aids (e.g., Mayo Clinic statin choice tool) operationalize cohort-derived risk equations (Pooled Cohort Equations, FRAX, Gail).
— ASCVD risk (Pooled Cohort Equations) drives statin and antihypertensive initiation thresholds (ACC/AHA).
— FRAX drives osteoporosis pharmacotherapy decisions.
— Gail/Tyrer-Cuzick models guide tamoxifen chemoprevention.
— All derive from large prospective cohorts — recognize the design behind the calculator.
— Follow-up intervals (annual lipid panels, every-3-year diabetes screening) reflect incidence rates from cohort studies.
— Surveillance recommendations after cancer treatment (e.g., colonoscopy intervals) come from outcome cohorts.
— Risk-stratified care management (high-risk diabetic clinic) targets resources to those with highest predicted incidence.
— Value-based contracting uses cohort-derived risk-adjustment to compare provider performance fairly.
Step 3 management: When initiating a statin based on a 10-year ASCVD risk of 12%, explicitly translate to the patient: "Out of 100 people like you, about 12 will have a heart attack or stroke in 10 years without treatment; statin therapy reduces that to roughly 9 — a benefit for about 3 in 100." This patient-centered framing using absolute risk is the recurring expectation on Step 3 communication items.

— Pooled Cohort Equations over-predict in some contemporary populations (lower event rates than original cohorts); periodic recalibration is required.
— Race-specific coefficients in older risk tools have been deprecated in favor of social determinants approaches (e.g., PREVENT equations from AHA, 2023).
— ASCVD risk: reassess every 4–6 years in adults 40–75.
— Diabetes screening: every 3 years if normal, annually if prediabetes (cohort-derived progression rates).
— Lung cancer screening: annual low-dose CT in eligible 50–80-year-olds based on NLST cohort data.
— Smoking cessation: 5 A's at every visit; quit rates and post-quit risk decline curves come from cohorts.
— Post-MI: secondary prevention adherence checked at 1, 3, 6, 12 months; cohort data show maximal benefit with sustained adherence.
— Cardiac rehab participation post-MI reduces mortality by ~20% (cohort + RCT data) — a CMS quality measure.
— Pulmonary rehab post-COPD exacerbation reduces readmission within 30 days.
— Required for lung cancer screening (CMS), often for PSA screening, and increasingly for low-yield preventive interventions.
— Documentation should reference the evidence base (cohort vs RCT) and the patient's values.
Board pearl: Cohort-derived risk calculators must be applied to populations similar to the derivation cohort. Applying Framingham (white middle-aged) to a 30-year-old South Asian patient underestimates risk — Step 3 may test recognition of risk-equation calibration failure in atypical populations and ask for adjunctive markers (CAC score, ApoB).

— Prospective cohorts require IRB-approved consent for enrollment, biospecimen banking, and re-contact.
— Retrospective EHR cohorts may use waiver of consent under HIPAA if minimal risk and impracticable to obtain — but require IRB review.
— Broad consent (21st Century Cures Act framework) allows future unspecified research uses with appropriate disclosure.
— De-identification per HIPAA Safe Harbor (18 identifiers removed) or expert determination.
— Re-identification risk rises with rich longitudinal data and genomic linkage — a growing concern in biobank cohorts.
— Historical under-enrollment of women and minorities in landmark cohorts (Physicians' Health Study, MRFIT) limited generalizability. NIH Revitalization Act (1993) mandates inclusion.
— All of Us Research Program explicitly recruits under-represented groups.
— Industry-sponsored cohort studies must disclose funding; readers should weight accordingly.
— When a participant develops a clinically significant incidental finding (e.g., aortic aneurysm on a research CT), the cohort protocol must specify return-of-results pathway and clinician notification — a recurring Step 3 ethics scenario.
— If follow-up uncovers child abuse, intimate partner violence, certain communicable diseases, or impaired drivers, state mandatory reporting laws override research confidentiality.
Step 3 management: A research participant in a longitudinal cohort is found incidentally to have a 6 cm abdominal aortic aneurysm on study imaging. The correct action is immediate notification of the participant and their primary clinician with documentation in the medical record — research confidentiality does not justify withholding clinically actionable findings. This duty-to-warn analog is explicit in modern IRB-approved cohort protocols.

Board pearl: If a Step 3 question gives a 2×2 table built from cohort data and asks for "the measure of association," compute risk in exposed / risk in unexposed = RR. Do not compute (a×d)/(b×c) — that's an odds ratio and earns a wrong answer despite arithmetic effort.

— "Investigators enrolled 10,000 firefighters and 10,000 office workers and tracked them for 15 years for development of cardiovascular disease." → Prospective cohort. Statistic = RR.
— Stem provides 2×2 with exposure rows and outcome columns + follow-up time. Compute risk in each row, then RR. If person-time differs, compute incidence rate ratio.
— "Patients prescribed the drug were sicker at baseline; after follow-up, mortality was higher in drug users." → Confounding by indication.
— "Employed cohort had lower all-cause mortality than the general population." → Healthy-worker effect.
— "Only patients who survived to receive the transplant were counted in the treatment arm." → Immortal time bias.
— "HR 0.85 (95% CI 0.72–1.01)." → Not statistically significant (CI crosses 1).
— Rare outcome, common exposure → case-control.
— Common outcome, rare exposure → cohort.
— New drug safety post-marketing → retrospective cohort.
— Definitive efficacy of a new agent → RCT.
— Stem gives RR or HR and asks what to tell the patient → convert to absolute risk reduction and NNT.
— Incidental findings in research → notify participant and clinician.
— Retrospective EHR study → waiver of consent if minimal risk.
Key distinction: Step 3 biostats stems often bury the design clue in a long clinical vignette. Read the methods sentence first ("followed for X years," "identified cases with disease," "measured at a single visit") — that sentence alone usually tells you the design, statistic, and bias to expect.

Cohort studies follow exposed and unexposed groups forward in time to measure incidence and yield relative risk or hazard ratio — establishing temporality but never escaping the possibility of residual confounding, which is why RCTs remain the definitive arbiter of causation.
Board pearl: Whenever a Step 3 vignette mentions "incidence," "person-years," "Kaplan–Meier," "hazard ratio," or "followed prospectively," reflexively label the design as a cohort, reach for relative risk or hazard ratio as the effect measure, and screen the methods for healthy-user, immortal-time, or confounding-by-indication bias before accepting the conclusion as clinically actionable.

