Biostatistics & Population Health
Bias in research: selection, information, confounding
— Distinct from random error (precision/chance), which widens confidence intervals but does not skew the point estimate
— Bias shifts the point estimate; larger sample size does not fix bias
— Selection bias — who gets into the study or who stays
— Information (measurement) bias — how exposure or outcome is captured
— Confounding — a third variable distorts the exposure–outcome link
— Study results conflict with prior literature or biologic plausibility
— Surprisingly large effect size from a retrospective or convenience-sample design
— Heavy loss to follow-up (>20%) or low response rate (<60%)
— Self-reported exposures, especially sensitive ones (alcohol, sexual history, diet recall)
— Case-control study where controls are drawn from a single clinic or hospital
— Observational study comparing patients who "chose" treatment A vs B

— "Cases were recruited from a tertiary referral center; controls were healthy hospital employees" → Berksonian (admission rate) bias
— "30% of participants were lost to follow-up, predominantly those with severe disease" → attrition bias / loss-to-follow-up bias
— "Survey mailed; 25% responded" → non-response bias
— "Workers in factory had lower mortality than general population" → healthy worker effect
— "Cancer screening detected more indolent tumors in the screened arm" → length-time bias; survival measured from diagnosis rather than birth → lead-time bias
— "Mothers of children with birth defects recalled medication use more thoroughly than mothers of healthy children" → recall bias
— "Interviewer knew case status when asking about exposures" → interviewer/observer bias
— "Patients knowing they received the drug reported more improvement" → performance/response bias, mitigated by blinding
— "Pathologist aware of clinical history graded biopsies" → detection/ascertainment bias
— "Hawthorne effect" — subjects change behavior because they know they're being watched
— "Coffee drinkers had higher lung cancer rates" — smoking confounds
— Older age, socioeconomic status, comorbidity burden differ between groups at baseline
— Observational comparison of two treatments where sicker patients got the newer one → confounding by indication

— Systematic review/meta-analysis of RCTs
— Single well-designed RCT
— Prospective cohort
— Retrospective cohort
— Case-control
— Cross-sectional
— Case series / case report
— Source population defined? Inclusion/exclusion explicit?
— Sampling method: random, consecutive, convenience, volunteer?
— Response/participation rate documented?
— Were assessors blinded?
— Standardized measurement tools (validated instruments vs self-report)?
— Same intensity of follow-up in both arms (surveillance bias if not)?
— Table 1 baseline characteristics — large differences signal selection bias or, in RCTs, randomization failure
— Were known confounders measured? Adjusted for? Stratified?
— Loss-to-follow-up rate per arm; differential loss = attrition bias
— Intention-to-treat vs per-protocol analysis declared

— Berkson bias (admission rate bias): Hospital-based case-control study — both the disease and the exposure independently raise hospitalization probability, creating spurious association
— Healthy worker effect: Employed populations are systematically healthier than the general population; comparing workers to general population underestimates occupational harm
— Non-response bias: Responders differ from non-responders on key variables (e.g., health-conscious individuals respond more to nutrition surveys)
— Loss-to-follow-up (attrition) bias: Differential dropout between arms; classically those who do poorly drop out, inflating apparent benefit
— Self-selection / volunteer bias: Volunteers for screening trials are healthier and more adherent than the general population — overestimates real-world effectiveness
— Prevalence-incidence (Neyman) bias: Cross-sectional or case-control studies miss patients who died quickly or recovered, capturing only survivors with chronic disease
— Referral filter bias: Tertiary-center cohorts have higher disease severity than community samples
— Mismatched recruitment sources for cases vs controls
— High dropout (>20%) especially if asymmetric between arms
— Volunteer-recruited screening or wellness study
— Population-based sampling, random-digit dialing, or registry-based enrollment
— High follow-up rates; sensitivity analyses imputing missing outcomes
— Intention-to-treat analysis preserves the original randomized groups and minimizes attrition bias in RCTs
— Choose controls from the same source population that generated the cases

— Non-differential misclassification — errors occur equally in exposed/unexposed (or case/control); biases estimate toward the null (dilution of true effect)
— Differential misclassification — errors differ by group; biases estimate in either direction, often away from the null
— Recall bias (classic case-control flaw): cases search memory harder than controls — e.g., mothers of malformed infants recall first-trimester drugs better
— Interviewer/observer bias: unblinded interviewer probes cases more thoroughly for exposures
— Detection (ascertainment) bias: outcome is sought more aggressively in one group — e.g., women on OCPs get more pelvic exams, so cervical pathology detected more
— Surveillance bias: intensively monitored patients accumulate more diagnoses regardless of true incidence
— Hawthorne effect: participants change behavior because they know they are observed
— Reporting / social desirability bias: under-reporting of sensitive exposures (alcohol, sexual partners, illicit drug use)
— Will Rogers phenomenon: stage migration with better imaging improves outcomes in each stage without true biologic change
— Misclassification by faulty instrument: poor sensitivity/specificity of an exposure or outcome measure
— Blinding of subjects, providers, outcome assessors, and data analysts
— Use objective, validated measures (lab values, registries, adjudicated endpoints)
— Collect exposure data before outcome is known (prospective design)
— Standardized, structured questionnaires; multiple data sources

— Associated with the exposure (in the source population)
— Independent risk factor for the outcome (not just through exposure)
— Not on the causal pathway between exposure and outcome (else it is a mediator, not a confounder)
— Smoking confounds coffee–lung cancer association
— Age confounds nearly every chronic disease comparison
— Socioeconomic status confounds diet, exercise, and health outcomes
— Severity of illness confounds observational drug comparisons (confounding by indication — sicker patients get the newer/aggressive drug, making it look worse)
— Crude (unadjusted) estimate differs meaningfully from adjusted estimate (>10% change) → confounding present
— Imbalanced Table 1 in observational studies
— Surprising direction reversal after adjustment = Simpson's paradox (extreme confounding)
— Confounder: distorts overall estimate; stratum-specific estimates are similar to each other but differ from the crude estimate → report adjusted estimate
— Effect modifier: stratum-specific estimates truly differ from each other → report stratified estimates separately; do not pool
— Randomization (gold standard, balances measured and unmeasured confounders)
— Restriction (enroll only one stratum, e.g., non-smokers)
— Matching (case-control studies match on age, sex)
— Stratification (Mantel-Haenszel)
— Multivariable regression (logistic, Cox, linear)
— Propensity score matching for observational comparative effectiveness
— Instrumental variable analysis when unmeasured confounding suspected

— Equalizes baseline characteristics (measured + unmeasured) → eliminates confounding
— Use allocation concealment (sealed envelopes, central randomization) to prevent enrollment manipulation
— Block randomization ensures balanced arm sizes; stratified randomization balances key prognostic variables across arms
— Single-blind: subject unaware; double-blind: subject + investigator; triple-blind: add outcome adjudicator/analyst
— Prevents performance bias, ascertainment bias, and reporting bias
— Use placebo or sham procedure controls
— Restrict to homogeneous population (e.g., non-smokers, single sex, narrow age band) — controls confounders but reduces external validity
— Match cases and controls on confounders; cannot then analyze the matched variable as exposure
— Validated instruments, calibrated equipment, structured interviews
— Pre-specified outcome definitions, blinded adjudication committee
— Objective endpoints (mortality, lab values) > subjective endpoints (pain scores)
— Multivariable regression for measured confounders
— Propensity scores to balance observational cohorts
— Sensitivity analyses for unmeasured confounding (E-value)
— Intention-to-treat (ITT) analysis as the primary analysis to preserve randomization and limit attrition bias; per-protocol as secondary
— Power calculations address random error, not bias
— A huge biased study is more dangerous than a small unbiased one — precision around the wrong answer

— Propensity score methods: model probability of receiving exposure given covariates; match, stratify, or weight on the score → balances measured confounders, mimics RCT structure
— Inverse probability of treatment weighting (IPTW): creates pseudo-population where exposure is independent of measured confounders
— Instrumental variable analysis: uses a variable associated with exposure but not directly with outcome (e.g., distance to specialty hospital) to estimate causal effect under unmeasured confounding
— Regression discontinuity and difference-in-differences designs leverage natural experiments
— Mendelian randomization: genetic variants as instruments for lifelong exposure (e.g., LDL-lowering alleles for statin-mimicking effects)
— Lead-time bias correction: adjust screening trial outcomes by computing mortality from a fixed reference point (e.g., date of randomization), not date of diagnosis
— Length-time bias: mitigated by reporting disease-specific mortality rather than survival from diagnosis
— Immortal time bias: in cohort studies where exposure definition requires survival to a certain point; use time-varying exposure modeling
— Competing risks: use Fine-Gray subdistribution hazards rather than Kaplan-Meier when non-outcome death is common
— E-value: minimum strength an unmeasured confounder would need to nullify the observed association — higher E-value = more robust finding
— Negative control outcomes and exposures probe for hidden bias
— CONSORT for RCTs, STROBE for observational, PRISMA for meta-analyses, STARD for diagnostic accuracy

— Exclusion criteria (age >75, eGFR <30, Child-Pugh B/C, polypharmacy) eliminate the patients clinicians actually treat
— Result: external validity gap — efficacy estimates may not apply; harms (bleeding, falls, drug accumulation) underestimated
— Step 3 stems may show a meta-analysis dominated by middle-aged patients applied to an 82-year-old — ask about generalizability, not internal validity
— In observational studies, frail elders are less likely to receive aggressive therapy → "untreated" group looks worse not because therapy works but because treated patients were healthier (healthy adherer effect / healthy user bias)
— Adherent patients have better outcomes regardless of what they take — placebo adherers in landmark trials outperform non-adherers
— Cognitive impairment → recall bias in self-reported exposures and medication histories
— Polypharmacy obscures drug–outcome attribution
— Higher surveillance (more clinic visits) → surveillance bias inflating incidence of diagnosed conditions
— Pharmacokinetic studies often exclude these patients; dose adjustments rely on small PK substudies
— Competing risk of death from non-target disease distorts long-term outcome studies (use competing-risk analyses)

— Pregnancy registries (e.g., antiepileptic, antiretroviral) — prone to non-response and reporting bias
— Case-control teratogen studies — heavy recall bias (mothers of affected infants over-report exposures)
— Administrative database studies — prone to confounding by indication (drug given for an illness that itself affects pregnancy outcome)
— Small sample sizes → wide CIs, often underpowered
— Off-label prescribing based on extrapolation from adult trials introduces transportability bias
— Outcome measures (developmental milestones) are observer-rated → ascertainment bias if unblinded
— Pivotal trials historically enroll predominantly white, higher-SES populations
— Genetic and pharmacogenomic differences (e.g., warfarin, clopidogrel, carbamazepine HLA-B*1502 in Asian populations) may not be captured
— Social determinants of health act as unmeasured confounders in observational disparities research
— Healthy volunteer bias in screening trials (Pap, mammography uptake) overestimates real-world adherence
— Cardiovascular trials historically underenrolled women → women's atypical presentations under-recognized
— Pharmacokinetic differences (zolpidem dose halving in women) emerged post-marketing

— Hormone replacement therapy (HRT) for cardiovascular prevention — observational studies showed benefit (healthy user bias); WHI RCT later showed harm → discontinuation, but only after years of preventable events
— Antiarrhythmic suppression (CAST trial) — surrogate-endpoint reasoning suggested PVC suppression would save lives post-MI; flecainide/encainide actually increased mortality
— Bone marrow transplant for breast cancer in the 1990s — uncontrolled case series suggested benefit; RCTs showed none
— Vertebroplasty appeared effective in uncontrolled series; sham-controlled trials showed equivalence
— Inflated effect sizes from biased trials drive drug approvals and guideline recommendations that later require reversal
— Publication bias toward positive results compounds the problem — meta-analyses overestimate treatment effects
— Spin in abstracts misrepresents non-significant findings as beneficial
— Lead-time/length-time bias → unjustified expansion of screening programs (e.g., neuroblastoma screening in Japan, prostate-specific antigen overuse)
— Confounding by indication → false safety signals or false efficacy in observational comparative effectiveness research
— Attrition bias → overstated tolerability of chronic therapies
— Differential recall → false teratogen alerts (Bendectin saga)
— Data fabrication, p-hacking, HARKing (hypothesizing after results known), selective outcome reporting → addressed by pre-registration on ClinicalTrials.gov

— Loss to follow-up >20% without sensitivity analysis or differential between arms
— No blinding in a trial with subjective primary outcome (pain, quality of life, global impression)
— Surrogate endpoint without validated linkage to patient-important outcomes
— Per-protocol-only analysis in an RCT with significant crossover or non-adherence
— Composite endpoint driven by the softest component (e.g., hospitalization, revascularization) rather than mortality
— Subgroup analysis without pre-specification and adjustment for multiple comparisons (false discovery)
— Single hospital-based case-control with unmatched, non-population-based controls
— Observational comparative effectiveness without propensity adjustment or instrumental variables
— Step 1: Is this an RCT? If yes, check randomization quality, blinding, ITT, follow-up
— Step 2: If observational, check confounder measurement and adjustment
— Step 3: Apply patient context — generalizability, competing risks, baseline risk
— Step 4: Consult systematic reviews / clinical practice guidelines when single studies conflict
— "Statistically significant in subgroup analysis"
— "Trend toward benefit (p = 0.08)" — non-significant
— "Open-label extension showed sustained improvement"
— "Industry-sponsored, with investigator-employed authors" — sponsorship bias risk

— Berkson bias vs referral filter bias: Berkson is specifically the spurious association created when both case-defining illness and exposure increase hospitalization probability; referral filter is the broader severity gradient in tertiary centers
— Non-response bias vs volunteer bias: non-response = those approached don't respond; volunteer = self-selecting enrollees (e.g., screening sign-ups) who differ from target population
— Attrition bias vs immortal time bias: attrition = patients leave the study; immortal time = exposure definition requires survival, so person-time pre-exposure is misclassified
— Lead-time bias vs length-time bias: lead-time = earlier detection without changing date of death gives illusion of longer survival; length-time = screening preferentially catches slow-growing indolent disease
— Recall bias vs reporting bias: recall = memory accuracy differs by group; reporting = willingness to disclose differs (often by social desirability)
— Detection/ascertainment bias vs surveillance bias: detection = outcome sought more aggressively in one group at a single point; surveillance = ongoing monitoring intensity differs
— Observer bias vs interviewer bias: observer = outcome assessor influenced by knowledge of exposure; interviewer = exposure assessor influenced by knowledge of outcome
— Hawthorne effect vs placebo effect: Hawthorne = behavior change from being watched; placebo = symptom improvement from belief in treatment
— Confounding by indication vs healthy user bias: indication = sicker patients channeled to specific therapy; healthy user = adherent/preventive patients have better outcomes regardless
— Confounding vs mediation: confounder is upstream of exposure; mediator is downstream (on causal pathway) — adjusting for a mediator inappropriately attenuates true effect

— Wide confidence intervals, p-values near threshold, small samples
— Fixed by larger sample size; bias is not
— A statistically significant result in a small study may be a chance finding (type I error), not bias
— A true biologic phenomenon: exposure effect differs across strata (e.g., aspirin benefit differs by sex in primary prevention)
— Report stratum-specific estimates; do not "adjust away"
— Distinguished from confounding by formal interaction testing (likelihood ratio test, interaction term in regression)
— Outcome causes exposure rather than vice versa (e.g., low cholesterol in cancer patients because cancer lowers cholesterol)
— Especially common in cross-sectional studies; mitigated by prospective design with clear temporal ordering
— Inferring individual-level associations from group-level data (countries with higher fat intake have higher heart disease ≠ individuals with higher fat intake have higher heart disease)
— Affects ecological/correlational studies
— Extreme baseline values naturally move toward the mean on repeat measurement, independent of intervention
— Mimics treatment effect in uncontrolled before-after studies
— Controlled by including a comparison group
— Direction of association reverses when data are aggregated vs stratified — extreme confounding
— Positive studies more likely to be published — distorts meta-analyses; assess with funnel plot asymmetry, Egger's test, trim-and-fill
— Industry funding correlates with positive results even after controlling for methodologic quality

— ClinicalTrials.gov registration required before enrollment; locks primary outcome, sample size, analysis plan
— Prevents outcome switching and HARKing
— Required by ICMJE journals for publication
— CONSORT (RCTs), STROBE (observational), PRISMA (systematic reviews), STARD (diagnostic), TRIPOD (prediction models), SPIRIT (protocols)
— Open data, open code, individual patient data meta-analyses
— Independent replication remains the strongest defense against bias
— Risk-of-bias tools: Cochrane RoB 2 (RCTs), ROBINS-I (non-randomized), QUADAS-2 (diagnostic), GRADE for evidence quality across outcomes
— Teach trainees to read methods first, then results
— Use systematic reviews and clinical practice guidelines rather than single studies when possible
— Anchor recommendations on patient-important outcomes and absolute risk reduction / NNT, not relative risk alone
— Apply shared decision-making when evidence is uncertain, with explicit discussion of evidence quality
— Treat industry-funded headlines, conference abstracts, and press releases as hypothesis-generating only
— Wait for peer-reviewed, replicated, hard-outcome data before changing chronic therapies

— Distinguish "no evidence of benefit" from "evidence of no benefit" — common patient misunderstanding
— Explain relative vs absolute risk; the same RRR feels very different at high vs low baseline risk
— Use decision aids (Mayo Clinic statin choice, etc.) that incorporate effect sizes and uncertainty
— Subscribe to systematic-review services (Cochrane, ACP Journal Club, NEJM Journal Watch) rather than tracking individual studies
— Re-evaluate chronic medications when new RCT data emerge (e.g., post-WHI HRT reassessment, post-SPRINT BP targets, post-DAPA-HF SGLT2 expansion)
— Document shared decisions about contested therapies in the chart
— Discuss lead-time and length-time bias when patients overestimate screening benefit
— Frame screening with NNS (number needed to screen) and overdiagnosis rates, not survival statistics
— Example: prostate cancer screening — explain that improved 5-year survival in screened populations reflects in part lead-time bias, not necessarily mortality reduction
— Adherent patients have better outcomes regardless of therapy (healthy adherer effect)
— Don't over-interpret a single patient's improvement on a new drug — n=1 cannot separate true effect from regression to the mean, placebo, or natural history
— Use plain language: "About 1 in 100 people will avoid a heart attack over 5 years if they take this medicine"
— Avoid framing effects (mortality vs survival framing changes patient choices)
— Acknowledge uncertainty without undermining trust

— Subjects must be told study purpose, risks, benefits, alternatives, and right to withdraw without penalty
— Disclosure of sponsorship and conflicts of interest to subjects and readers is ethically required
— Vulnerable populations (prisoners, children, cognitively impaired) require additional protections — IRB scrutiny against selection bias of convenience
— RCTs require clinical equipoise — genuine uncertainty about which arm is better
— If equipoise is lost during a trial (interim analysis shows clear benefit or harm), the Data Safety Monitoring Board (DSMB) may stop the trial early
— Stopping early for benefit systematically overestimates effect size — a recognized bias; Step 3 may test this
— Selective reporting, ghostwriting, and suppression of negative trials are research misconduct
— Mandatory results posting on ClinicalTrials.gov within 12 months (FDAAA 801)
— Authors must declare contributions and conflicts (ICMJE criteria)
— Biased trials → mistaken guidelines → systematic harm at population scale
— Quality improvement projects should include controls or pre/post with comparison groups to avoid attributing change to intervention when secular trends or regression to the mean are at play
— Medication reconciliation errors at discharge are a documented hazard — pharmacist-led reconciliation reduces adverse drug events
— Studies of reconciliation interventions are themselves prone to Hawthorne effect and secular trend confounding — interpret QI literature with bias lens
— Research-related adverse events must be reported to IRB and FDA per protocol
— Suspected research misconduct → institutional research integrity office, Office of Research Integrity (ORI) for federally funded work


— Vignette describes a study design flaw; asks which bias is most likely
— Strategy: identify whether the issue is at enrollment/retention (selection), measurement (information), or comparison group balance (confounding)
— Asks whether the true effect is larger, smaller, or unchanged compared to reported
— Non-differential misclassification → true effect larger than reported
— Healthy worker effect → true harm larger than reported
— Lead-time bias → true mortality benefit smaller (often zero) than survival appears
— Map: randomization for confounding, blinding for information bias, population-based sampling for selection bias
— If stratum-specific estimates are similar to each other but differ from crude → confounder → adjust
— If stratum-specific estimates differ from each other → effect modifier → report separately
— Survival from diagnosis improved but mortality unchanged → lead-time bias
— More indolent tumors detected → length-time bias
— Healthier volunteers enrolled in screening arm → self-selection
— Observational shows benefit, RCT shows none/harm → suspect confounding by indication or healthy user bias (HRT, vitamin E, beta-carotene)
— Drug improves surrogate (LDL, A1c, blood pressure) but RCT shows no mortality benefit → don't recommend based on surrogate alone
— Industry-funded, investigators employed, ghostwritten → consider sponsorship bias
— Overestimates true effect; await confirmatory trials before changing practice
— Patient cites observational study or media report; correct answer is acknowledge, explain limitations, continue evidence-based care

Bias is systematic error that shifts study results away from truth, and it comes in three master flavors — selection (who got in), information (how data were measured), and confounding (a third variable distorting the link) — each with distinct design-phase fixes (population-based sampling and intention-to-treat; blinding and objective measurement; randomization and multivariable adjustment) that Step 3 expects you to recognize, name, predict the direction of, and counteract before applying any study to a patient.

