Biostatistics & Population Health
Risk of bias assessment tools (Cochrane RoB, ROBINS-I)
— Distinct from imprecision (random error, addressed by confidence intervals) and applicability/indirectness (generalizability to your patient)
— Bias is a systematic error; sample size cannot fix it
— Reading a single trial to decide if it should change your management
— Performing or interpreting a systematic review/meta-analysis — GRADE downgrades certainty when included studies have high RoB
— Evaluating guideline recommendations — strength of recommendation tracks with body-of-evidence RoB
— Journal club, IRB review, quality improvement projects, and value-based care committees
— Cochrane RoB 2 (RoB 2.0, 2019) — for randomized controlled trials only
— ROBINS-I (2016) — "Risk Of Bias In Non-randomized Studies of Interventions" — for cohort, case-control, before-after, and quasi-experimental designs evaluating an intervention
— ROBINS-E exists for exposures (etiology questions), not interventions
— Open-label trial with subjective outcome (pain, quality of life)
— Industry-sponsored trial with selective outcome reporting
— Observational data claiming causation without addressing confounding
— Large loss to follow-up (>20%) or differential attrition between arms
— Post-hoc subgroup analyses driving the headline conclusion

— D1: Randomization process — adequate sequence generation, allocation concealment, baseline imbalance
— D2: Deviations from intended interventions — was blinding maintained, were per-protocol vs ITT analyses appropriate
— D3: Missing outcome data — completeness of follow-up, reasons for missingness
— D4: Measurement of the outcome — blinded outcome assessors, validated instruments
— D5: Selection of the reported result — pre-specified analysis plan vs cherry-picked outcomes
— Pre: confounding, selection of participants
— At: classification of interventions
— Post: deviations from intended interventions, missing data, measurement of outcomes, selection of reported result
— "Patients were assigned based on physician preference" → selection bias / confounding by indication
— "Outcome adjudicators were aware of treatment assignment" → detection bias (D4)
— "Three patients were excluded from analysis after randomization" → attrition + analysis bias (D2, D3)
— "Primary outcome changed from mortality to composite endpoint" (check ClinicalTrials.gov) → reporting bias (D5)
— Lack of trial registration before enrollment is a major red flag

— A series of signaling questions answered Yes / Probably Yes / Probably No / No / No Information
— An algorithm maps signaling answers to a domain judgment: Low risk, Some concerns, or High risk
— An overall judgment is then derived:
— Low risk overall = low risk in all domains
— Some concerns = some concerns in ≥1 domain, no high risk
— High risk = high risk in ≥1 domain OR multiple domains with some concerns that substantially lower confidence
— Signaling questions, then a domain judgment of Low, Moderate, Serious, Critical, or No Information
— Overall judgment equals the worst domain rating (weakest link principle)
— Critical means the study is too problematic to provide useful evidence — typically excluded from meta-analysis or downgraded heavily in GRADE
— RoB 2: 3 levels (Low / Some concerns / High)
— ROBINS-I: 5 levels (Low / Moderate / Serious / Critical / No Information)
— ROBINS-I "Low" is benchmarked to a well-conducted RCT — a non-randomized study can theoretically achieve "Low" but rarely does in practice
— Newcastle-Ottawa Scale (NOS) — older, star-based, for observational studies; Cochrane prefers ROBINS-I now
— Jadad scale — older 5-point scale for RCTs; superseded by RoB 2
— QUADAS-2 — for diagnostic accuracy studies (not interventions)
— AMSTAR-2 — assesses the systematic review itself, not the included studies

— Signaling questions probe: Was allocation sequence random? Was it concealed until assignment? Were baseline characteristics balanced?
— Red flags: alternating assignment, assignment by birth date or medical record number, visible allocation envelopes, large baseline imbalance in key prognostic variables
— Adequate methods: computer-generated sequence with centralized or sealed opaque sequentially numbered envelopes
— Two effects of interest: assignment effect (ITT-like) vs adherence effect (per-protocol)
— Were participants and personnel blinded? Were deviations balanced between arms?
— Intention-to-treat analysis preserves randomization and is preferred for effectiveness questions
— Per-protocol analysis is acceptable for efficacy/explanatory questions but introduces selection bias
— Was outcome data available for "all or nearly all" randomized participants? Threshold often ≥95% for low risk
— Was missingness likely to depend on the true outcome? (e.g., sicker patients drop out)
— Differential dropout between arms is more concerning than symmetric dropout
— Was the outcome assessor blinded? Was the measurement method appropriate and identical between groups?
— Objective outcomes (mortality, lab values) are more bias-resistant than subjective ones (pain, function)
— Was the analysis pre-specified in a registered protocol or statistical analysis plan?
— Check ClinicalTrials.gov for outcome switching
— Multiple outcome measurement instruments or time points → potential for cherry-picking

— List all important prognostic confounders a priori
— Did the study measure and adjust for them (multivariable regression, propensity score matching/weighting, instrumental variables)?
— Time-varying confounding (treatment changes over time) requires marginal structural models — rarely done well
— Critical rating if a major confounder was unmeasured and likely to bias direction is unknown
— Immortal time bias — follow-up time during which the outcome could not have occurred is misclassified
— Prevalent user bias — including patients already on a drug excludes those who had early adverse events
— Solution: new-user/active-comparator designs
— Was exposure status determined before the outcome, using data not influenced by outcome knowledge?
— Misclassification (e.g., relying on a single pharmacy claim) can be differential or non-differential
— Did participants switch treatments? Was there co-intervention imbalance?
— Pattern and handling (complete-case vs multiple imputation)
— Were outcome ascertainment methods the same across exposure groups?
— Detection bias is common when exposed patients are monitored more closely
— Multiple analytic choices (subgroups, models) raise concern

— RCTs start at High certainty
— Observational studies start at Low certainty
— Risk of bias (from RoB 2 or ROBINS-I across included studies)
— Inconsistency (unexplained heterogeneity, I² high)
— Indirectness (PICO mismatch)
— Imprecision (wide CI crossing decision threshold, small N)
— Publication bias (asymmetric funnel plot, small-study effects)
— Large effect (RR >2 or <0.5)
— Dose-response gradient
— Plausible confounding would bias toward the null (and effect was still seen)
— If ≥1 included study has High RoB / Serious ROBINS-I and contributes substantially to pooled estimate → downgrade certainty 1 level
— If most or all studies are High RoB / Critical → downgrade 2 levels

— Computer-generated allocation sequence — eliminates selection bias at entry
— Central randomization (web/phone) — best allocation concealment
— Stratified or block randomization — prevents baseline imbalance in key prognostic factors
— Cluster randomization — when individual randomization is infeasible; requires ICC adjustment
— Double-blind (participant + provider) — prevents performance bias
— Triple-blind adds outcome assessor — prevents detection bias
— Placebo-controlled — essential when outcomes are subjective
— PROBE design (Prospective Randomized Open Blinded Endpoint) — open treatment but blinded adjudication; acceptable when full blinding impossible (e.g., surgery)
— Intention-to-treat — primary analysis for superiority trials
— Per-protocol — secondary; primary for non-inferiority
— Multiple imputation for missing data under MAR assumption
— Pre-registered statistical analysis plan — prevents D5 bias
— Propensity score matching/weighting — balances measured confounders
— Instrumental variable analysis — addresses unmeasured confounding under strong assumptions
— Negative control outcomes — detect residual confounding
— Active-comparator new-user design — minimizes confounding by indication
— Sensitivity analyses (E-value) — quantifies robustness to unmeasured confounding

— Assignment effect (ITT) vs adherence effect (per-protocol) — choose based on the clinical question
— Outpatient effectiveness questions usually want assignment effect; mechanistic questions want adherence
— Search ClinicalTrials.gov / WHO ICTRP using NCT number
— Compare registered primary outcome to published primary outcome
— Look for outcome switching, sample size changes, analysis changes
— Quantify screened → enrolled → randomized → analyzed → followed-up
— Calculate dropout per arm; flag differential attrition
— Sensitivity analysis excluding high-RoB studies
— Stratified meta-analysis by RoB level
— GRADE assessment downgrades certainty accordingly
— Single assessor (should be two independent raters with adjudication for disagreements; report kappa)
— Skipping signaling questions and jumping to overall judgment
— Using RoB 2 for non-randomized studies
— Confusing study reporting quality (CONSORT) with study bias (RoB 2) — a well-reported high-bias study is still high-bias

— Open-label, broad eligibility, usual-care comparator, routine data collection
— D2 (deviations) and D4 (outcome measurement) are particularly vulnerable because blinding is limited
— Use objective outcomes (mortality, hospitalization) wherever possible to mitigate D4
— Use blinded outcome adjudication committees even when treatment is open-label
— Recruitment bias — participants enrolled after cluster randomization may be selected differentially based on assignment knowledge
— RoB 2 has a dedicated cluster extension (RoB 2 CRT) that adds signaling questions about identification/recruitment of participants
— Analysis must account for intracluster correlation (ICC); failure inflates type I error
— RoB 2 crossover extension addresses carryover effects, period effects, and unpaired analyses
— Pre-specified adaptation rules must be documented; unplanned adaptations introduce D5 bias
— Multiplicity adjustments required for interim analyses
— Cannot be assessed with RoB 2; require ROBINS-I or design-specific tools
— Common in rare diseases and pediatrics where parallel RCTs are infeasible
— Registry studies in renal/hepatic impaired patients — selection bias if registries enroll preferentially at academic centers
— Elderly cohort studies — competing risks (death from other causes) can bias toward apparent treatment benefit if not modeled

— Most evidence is observational (cohort, case-control) due to ethical exclusion from RCTs
— ROBINS-I is the appropriate tool
— Confounding by indication dominates: pregnant patients receiving a drug differ systematically (e.g., depression severity in SSRI studies)
— Recall bias is prominent in case-control studies of teratogens — mothers of affected infants recall exposures more thoroughly
— Mitigation: prospective pregnancy registries, sibling-comparison designs, negative control exposures
— Small sample sizes → imprecision often dominates over bias
— Off-label prescribing means much pediatric evidence is extrapolated; indirectness rather than bias
— Adolescent self-report outcomes vulnerable to D4 detection bias
— Long follow-up needed for developmental outcomes → attrition (D3) risk
— Bias against underrepresented populations is itself a form of selection bias affecting external validity
— Cochrane Equity Methods Group recommends explicit reporting of demographic representation
— Consider whether trial sample reflects the population to whom guideline applies
— Pre-specified subgroup analyses by sex: acceptable; post-hoc: D5 concern
— Many cardiovascular trials historically under-enrolled women → applicability/indirectness, not RoB per se
— Selection bias if biobank populations are predominantly European-ancestry

— Inadequate allocation concealment inflates effect estimates by ~30–40% on average (Schulz/Chalmers meta-epidemiologic work)
— Lack of blinding inflates subjective-outcome effects similarly
— Real-world example: many surgical innovations adopted on uncontrolled case series later show no benefit in RCTs
— Short follow-up, selective adverse event reporting (D5), missing data (D3) all suppress harm signals
— Pharmaceutical post-marketing experience repeatedly shows harms emerge years after approval (rofecoxib, rosiglitazone)
— Confounding by indication in observational drug studies generates false harm signals
— Hormone replacement therapy: observational data suggested cardioprotection; WHI RCT showed harm — a paradigmatic ROBINS-I failure even in retrospect
— Guidelines based on low-RoB observational data may later reverse when RCTs emerge
— Causes whiplash, erodes patient trust, increases medico-legal exposure
— High-RoB studies often cannot be incorporated into meta-analyses
— Lancet REWARD series estimates 85% of biomedical research is "avoidable waste" — much from poor methodology
— Acting on biased evidence may delay effective therapy or expose to ineffective/harmful intervention
— Especially consequential for screening decisions (lead-time, length-time bias misrepresent screening benefit)
— Biased samples produce evidence that fails minoritized populations, perpetuating disparities

— Study should be excluded from meta-analysis or analyzed only in sensitivity analysis
— Equivalent to "do not use this evidence to guide practice"
— Common triggers: unmeasured major confounder, severe selection bias, outcome ascertainment differential between groups
— Downgrade GRADE certainty by 2 levels
— Consider not citing in clinical guideline body of evidence
— Mandate sensitivity analysis excluding such studies in any meta-analysis
— Major D5 concern; warrants editorial correction, possibly retraction
— COMPare project systematically catalogs this in high-impact journals
— Statistical anomalies (impossible variance, baseline distribution implausibility — Carlisle method)
— Image duplication, plagiarism
— Escalate to journal editor, institutional research integrity officer, ORI (Office of Research Integrity) if US federally funded
— Industry sponsor controls data analysis or publication decision
— Authors with undisclosed financial ties
— ICMJE disclosure standards apply
— Any meta-analysis, network meta-analysis, propensity score analysis
— Complex missing data patterns
— Surrogate outcomes requiring trial-level validation

— RoB 2 (current Cochrane standard, 2019)
— Older: Cochrane RoB 1.0 (still encountered), Jadad scale (obsolete), PEDro (physiotherapy)
— ROBINS-I (current Cochrane standard)
— Older: Newcastle-Ottawa Scale (cohort/case-control), Downs and Black checklist
— ROBINS-I is preferred over Newcastle-Ottawa by Cochrane
— ROBINS-E (2022) — for environmental, occupational, lifestyle exposure → outcome questions
— Distinct from ROBINS-I because exposures are typically not assignable interventions
— QUADAS-2 — Patient selection, Index test, Reference standard, Flow and timing
— QUADAS-C for comparative accuracy
— QUIPS — quality in prognosis studies (factor → outcome)
— PROBAST — prediction model studies (development/validation of risk scores)
— AMSTAR-2 — assesses the conduct of the SR
— ROBIS — Risk Of Bias In Systematic reviews
— CASP qualitative checklist, GRADE-CERQual
— CHEERS reporting, Drummond checklist
— SYRCLE's RoB tool

— Defined: systematic deviation of estimate from truth
— Addressed by: RoB 2, ROBINS-I, ROBINS-E
— Examples: selection, performance, detection, attrition, reporting bias
— Not fixed by larger sample size
— Defined: variability around the true estimate due to chance
— Addressed by: confidence intervals, power calculations
— GRADE downgrades for imprecision when CI crosses decision threshold
— Fixed by larger sample size
— A subset of bias (D1 in ROBINS-I) but conceptually distinct
— Third variable associated with both exposure and outcome, not on causal pathway
— Addressed by: randomization (ideal), adjustment, matching, stratification, propensity scores, instrumental variables
— Eliminated in expectation by randomization for measured AND unmeasured confounders — the foundational advantage of RCTs
— PICO mismatch between study and clinical question
— Not bias per se; the study may be internally valid but inapplicable
— Addressed by GRADE indirectness domain
— Variation in effect estimates across studies in a meta-analysis
— Measured by I², τ², Cochran's Q
— May reflect true effect modification or unrecognized bias
— Asymmetric availability of evidence (positive studies more likely published)
— Detected by funnel plot, Egger's test, trim-and-fill
— Distinct from individual-study RoB but contributes to body-of-evidence bias

— Before adopting any new drug, device, or screening test, identify the pivotal trial(s) and assess RoB
— Subscribe to evidence summaries that include RoB judgments (Cochrane Library, BMJ EBM, ACP Journal Club, NEJM Journal Watch)
— Maintain a personal repository of high-quality systematic reviews for common conditions
— Hospital P&T committees should require RoB assessment for formulary additions
— Quality and safety committees should apply RoB to evidence underlying clinical pathways
— Residency programs increasingly include structured journal clubs using RoB 2/ROBINS-I worksheets
— Complete the Cochrane Interactive Learning modules on RoB 2 and ROBINS-I
— Participate in systematic review as author or peer reviewer
— Use Robvis (R package or web app) to generate traffic-light plots for journal club presentations
— RoB 2 tool: riskofbias.info
— ROBINS-I tool: methods.cochrane.org
— GRADE handbook: gdt.gradepro.org
— PRISMA 2020 checklist for systematic review reporting
— EQUATOR Network for reporting guidelines (CONSORT, STROBE, etc.)
— Resist citing single studies without methodologic appraisal
— Recognize the authority gradient — being a senior physician does not exempt one from RoB assessment of one's preferred evidence
— Acknowledge uncertainty to patients when evidence is biased or limited

— Cochrane and other organizations now produce living systematic reviews updated continuously (e.g., COVID-19 therapeutics)
— RoB assessments are re-run as new studies are added
— Effect estimates and certainty ratings change accordingly
— New trials registered on ClinicalTrials.gov addressing the same question
— Updated systematic reviews (search annually for high-stakes clinical questions)
— Guideline updates (USPSTF, AHA/ACC, ADA refresh on 3–5 year cycles)
— Post-marketing pharmacovigilance signals (FDA MedWatch, FAERS)
— Major new RCT contradicting prior estimate
— Methodologic concerns raised post-publication (data integrity questions, retractions)
— Tool updates — RoB 2 superseded RoB 1 in 2019; ROBINS-I may receive revisions
— De-implementation of low-value practices is harder than implementation
— Choosing Wisely lists ID practices supported by weak/biased evidence
— Inertia, financial incentives, and patient expectations resist de-adoption
— Frame recommendations with appropriate uncertainty ("based on current evidence, which may change…")
— Discuss screening recommendation changes (PSA, mammography age thresholds) honestly
— Document shared decision-making
— Adherence to high-certainty guideline recommendations
— Avoidance of low-value care
— Documentation of shared decision-making for conditional recommendations

— Investigators have an ethical obligation to minimize bias — biased research wastes participants' altruistic contributions and may harm future patients
— Pre-registration of protocols is increasingly seen as an ethical requirement, not just methodologic best practice
— Selective outcome reporting (D5) can constitute research misconduct when intentional
— When prescribing based on low-certainty evidence, disclosure of uncertainty is part of valid informed consent
— Example: prescribing an off-label medication supported only by case series — patient must understand the evidence limitations
— Failure to disclose evidence quality may expose physician to medical malpractice liability if outcome is poor
— ICMJE, ACGME, and most institutions require COI disclosure
— Industry-funded trials with high RoB on D5 (selective reporting) raise the highest concern
— Sunshine Act (Open Payments) makes physician industry payments publicly searchable
— Hospital-initiated medications based on biased inpatient evidence may be inappropriately continued by outpatient providers
— Discharge summaries should note when treatments are evidence-uncertain so primary care can re-evaluate
— Example: stress-dose PPI started in ICU continued for life — a documented patient safety problem from low-quality observational evidence
— Panel composition rules: minimize conflicted members in voting on recommendations
— Public comment periods enhance transparency
— NIH, many journals now require data sharing — enables independent re-analysis to detect bias
— Refusal to share data is itself a credibility signal
— Adverse events from biased evidence-based practice should be reported through institutional patient safety systems
— Pattern recognition across institutions (e.g., through PSO networks) can trigger evidence re-appraisal


— Stem: "A meta-analysis pools 12 cohort studies evaluating metformin's effect on cancer incidence. Which tool best assesses risk of bias of the included studies?"
— Answer: ROBINS-I (non-randomized intervention studies)
— Distractors: RoB 2 (wrong — observational), Newcastle-Ottawa (outdated, not Cochrane-preferred), QUADAS-2 (diagnostic, wrong)
— Stem: An open-label trial of physical therapy for chronic low back pain reports significant pain improvement (patient-reported VAS). Which RoB 2 domain is at highest risk?
— Answer: D4 — Measurement of the outcome (subjective outcome, unblinded participants)
— Stem: Pooled estimate from 3 RCTs shows benefit, but all 3 had high attrition (35%) and no ITT analysis. Starting GRADE certainty and final rating?
— Answer: Start High; downgrade for risk of bias at least 1 level → Moderate (possibly Low if attrition is severe)
— Stem: Observational study shows patients on antipsychotics have higher mortality than untreated patients with schizophrenia.
— Answer: Confounding by indication — sicker patients receive drug; ROBINS-I D1 Serious/Critical
— Stem: Multiple cohort studies suggested vitamin E reduced cardiovascular events; subsequent large RCT (HOPE) showed no benefit. Best explanation?
— Answer: Healthy user bias / residual confounding in observational data
— Stem: Published primary outcome differs from ClinicalTrials.gov-registered primary outcome without explanation.
— Answer: High risk on D5 — Selection of reported result
— Stem: Distinguishes pre-randomization concealment from post-randomization blinding
— Answer: Concealment → D1; blinding → D2/D4
— Stem: Industry-sponsored trial with selective reporting; how should physician counsel patient?
— Answer: Disclose evidence limitations and conflicts; engage shared decision-making

Risk of bias assessment uses Cochrane RoB 2 for randomized trials and ROBINS-I for non-randomized intervention studies to systematically judge whether a study's design, conduct, and analysis produce a trustworthy effect estimate — feeding directly into GRADE certainty and the strength of clinical recommendations you act on at the bedside.
— RoB 2 → RCTs (5 domains, 3-level rating)
— ROBINS-I → non-randomized intervention studies (7 domains, 5-level rating, confounding-anchored)
— ROBINS-E → exposure/etiology studies
— QUADAS-2 → diagnostic accuracy; PROBAST → prediction models; AMSTAR-2 → the systematic review itself
— Bias is systematic and not fixed by larger N
— Imprecision is random and shrinks with sample size
— Indirectness is PICO mismatch (applicability), not internal validity
— RCTs start High, observational start Low
— Downgrade for RoB, inconsistency, indirectness, imprecision, publication bias
— Strong recommendation usually requires Moderate–High certainty; conditional recommendations demand shared decision-making

