Biostatistics & Population Health
Systematic review and meta-analysis interpretation
— A systematic review (SR) is a structured synthesis of all available evidence answering a focused clinical question using prespecified, reproducible methods
— A meta-analysis (MA) is the quantitative pooling of effect estimates from those studies into a single summary statistic with a confidence interval
— Not every SR contains an MA; pooling is only appropriate when studies are clinically and methodologically similar enough
— Practicing physicians use SR/MA to update guidelines, justify formulary decisions, and counsel patients on relative vs absolute risk
— Exam stems frequently show a forest plot, funnel plot, or PRISMA flow diagram and ask you to interpret pooled effect, heterogeneity, or publication bias
— Translating pooled relative risk into NNT/NNH for a specific patient is a recurring task
— SR/MA of RCTs sits at the top only when the underlying RCTs are low risk of bias
— A high-quality single RCT can outrank a poorly-conducted MA of small biased trials
— "Garbage in, garbage out" — pooling biased studies produces a precise but wrong answer
— Single-author, no protocol registration (PROSPERO), no PRISMA checklist
— Search limited to one database (e.g., PubMed only) or one language
— No risk-of-bias assessment (Cochrane RoB 2 for RCTs, ROBINS-I for observational)
— Funded by manufacturer with author conflicts and only positive trials included
Board pearl: When a Step 3 stem says "a meta-analysis of 12 RCTs showed RR 0.80 (95% CI 0.65–0.98)," your first three reflexes should be — (1) Is the CI crossing 1? (2) What is the I² for heterogeneity? (3) Were the trials similar enough that pooling makes clinical sense? Statistical significance without clinical and methodologic coherence is a trap, not an answer.

— Population: who was studied (age, comorbidity, severity)
— Intervention: drug, dose, duration, comparator-relevant details
— Comparator: placebo, active control, usual care — matters enormously
— Outcome: prespecified primary vs secondary; patient-important vs surrogate
— Records identified → duplicates removed → titles/abstracts screened → full text assessed → studies included
— Each step lists exclusions with reasons
— A stem may show "5,432 records identified, 12 included" — ask why so many were dropped
— PROSPERO registration before data extraction prevents outcome switching and selective reporting
— An unregistered review with a primary outcome that conveniently became significant is a red flag
— Multiple databases (MEDLINE, Embase, Cochrane CENTRAL, ClinicalTrials.gov)
— Gray literature, conference abstracts, trial registries to combat publication bias
— Hand-searching references and contacting authors for unpublished data
— No language restriction (English-only introduces bias)
— Two independent reviewers screening and extracting, with conflict resolution
— Kappa statistic for inter-rater agreement (κ >0.6 acceptable)
— Single-reviewer extraction is a methodologic weakness
Key distinction: A narrative review is an author's curated opinion piece — no systematic search, no risk-of-bias assessment, no PRISMA. A systematic review is reproducible: another team following the same protocol should reach the same included-study list. On the exam, if the stem describes "an expert summarized the literature on…," that is not a systematic review and should not be treated as top-tier evidence regardless of journal prestige.

— Vertical line of no effect at RR/OR = 1.0 (or risk difference = 0)
— Each horizontal line = one study's point estimate and 95% CI
— Box size proportional to study weight (inverse variance — larger/more precise studies = bigger box)
— Diamond at the bottom = pooled effect; its width = pooled 95% CI
— CI crossing the line of no effect → that study alone is not statistically significant
— Narrow CI → precise estimate (usually larger n or more events)
— Wide CI → imprecise, often small or low-event study
— Diamond entirely to the left of 1.0 for "bad outcome" → intervention reduces risk
— Diamond crossing 1.0 → pooled effect not statistically significant
— Always check the direction label ("favors treatment ← → favors control")
— Studies pointing in the same direction with overlapping CIs → low heterogeneity, pooling reasonable
— Studies on opposite sides of 1.0 with non-overlapping CIs → high heterogeneity, pooled estimate suspect
— Look for outlier studies driving the result
— Separate diamonds for subgroups (e.g., age <65 vs ≥65)
— Test for subgroup interaction (p for interaction) determines if effect truly differs
— Subgroup p-values alone are misleading — chasing significant subgroups is data dredging
Board pearl: When you see a forest plot, scan in this order — (1) Direction of effect, (2) Does the pooled diamond cross 1.0?, (3) Are individual studies consistent?, (4) Is one mega-trial dominating the weight? A single large industry-sponsored RCT representing 70% of the pooled weight effectively makes the MA a referendum on that one trial — the "meta" is window dressing.

— Risk ratio (RR) = risk in treated / risk in control; intuitive, used in RCTs and cohorts
— Odds ratio (OR) = odds in treated / odds in control; used in case-control and logistic regression; approximates RR only when outcome is rare (<10%)
— Risk difference (RD) = absolute risk reduction; directly yields NNT = 1/RD
— Hazard ratio (HR) = ratio of instantaneous event rates over time (Cox model, time-to-event)
— Mean difference (MD) when all studies use the same scale (e.g., mmHg, kg)
— Standardized mean difference (SMD) when scales differ (e.g., different depression instruments); reported in standard deviation units
— SMD interpretation (Cohen): 0.2 small, 0.5 medium, 0.8 large
— 95% CI excluding 1.0 (for ratios) or 0 (for differences) → statistically significant at α=0.05
— Width reflects precision, not effect size
— Narrow CI around a clinically trivial effect ≠ clinically meaningful finding
— RR 0.75 sounds impressive but if baseline risk is 4% → ARR 1%, NNT = 100
— Same RR 0.75 with baseline 40% → ARR 10%, NNT = 10
— Always anchor relative effects to absolute baseline risk before counseling
— HbA1c reduction is a surrogate; cardiovascular events and mortality are patient-important
— MA of surrogate endpoints can mislead (CAST trial legacy)
Step 3 management: When a stem gives you RR 0.70 (95% CI 0.55–0.89) for stroke with a new anticoagulant and a baseline 5-year stroke risk of 6%, compute — ARR = 0.06 × 0.30 = 1.8%, NNT ≈ 56. Then compare against the NNH for major bleeding before recommending. Step 3 rewards this absolute-risk translation more than memorizing the RR itself.

— Cochran's Q (chi-square test): p<0.10 suggests heterogeneity; underpowered with few studies
— I²: proportion of variability due to between-study heterogeneity rather than chance
— — I² 0–40%: might not be important
— — I² 30–60%: moderate
— — I² 50–90%: substantial
— — I² 75–100%: considerable — pooling may be inappropriate
— Tau² (τ²): estimate of between-study variance in random-effects model
— Fixed-effect: assumes one true underlying effect; appropriate when studies are very similar
— Random-effects: assumes a distribution of true effects across populations; wider CI, more conservative; default when I² >0
— Random-effects gives more weight to small studies, which can amplify small-study bias
— Funnel plot: scatter of study effect (x-axis) vs precision/SE (y-axis); should look like a symmetric inverted funnel
— Asymmetry (missing small negative studies at bottom-left) suggests publication bias
— Egger's test, Begg's test: statistical tests for funnel asymmetry; need ≥10 studies
— Trim-and-fill estimates effect size after imputing "missing" studies
— Cochrane RoB 2 for RCTs: randomization, deviations from intervention, missing outcome data, outcome measurement, selective reporting
— ROBINS-I for non-randomized studies
— QUADAS-2 for diagnostic accuracy studies
— Downgrade for: risk of bias, inconsistency, indirectness, imprecision, publication bias
— Upgrade for: large effect, dose-response, plausible confounders biasing against effect
— Final rating: high, moderate, low, very low
Board pearl: High I² does not automatically invalidate an MA — it tells you to explore why through subgroup analysis, meta-regression, or sensitivity analysis. The wrong answer is to ignore it; the right answer is to investigate it.

— Protocol registered before review
— Adequate literature search
— Justification for excluded studies
— Risk-of-bias assessment in included studies
— Appropriate meta-analytic methods
— Consideration of RoB when interpreting results
— Assessment of publication bias
— Quality of evidence (high → very low) combined with balance of benefits/harms, values, and resource use
— Yields strong ("we recommend") or conditional/weak ("we suggest") recommendations
— A strong recommendation can rest on moderate-quality evidence if benefits clearly outweigh harms
— Were trial populations like your patient? (age, comorbidity, race/ethnicity, baseline severity)
— Was the comparator relevant to current practice? (placebo vs current standard)
— Was follow-up long enough to capture meaningful outcomes?
— Efficacy (ideal conditions, RCT) vs effectiveness (real-world)
— A pooled MD of 1.2 mmHg systolic BP can be p<0.001 across 50,000 patients yet clinically trivial
— Minimal clinically important difference (MCID) must be considered
— Industry funding correlates with more favorable conclusions (sponsorship bias)
— Author conflicts disclosed per ICMJE; reviews by independent groups (Cochrane) generally more credible
Step 3 management: When a guideline cites an MA to support a new therapy, ask — (1) Is the population like my patient? (2) Is the outcome patient-important? (3) Is the absolute benefit worth the harm and cost? A "strong recommendation, high-quality evidence" for a 65-year-old with no comorbidities may be a "conditional recommendation" for your 88-year-old nursing-home patient with dementia. Guidelines summarize populations; you treat individuals.

— Identify the pooled effect and its 95% CI
— Anchor to your patient's baseline risk (risk calculator, registry data)
— Compute ARR and NNT for benefits; ARI and NNH for harms
— Weigh against cost, adherence burden, patient preference
— CTT collaboration MA: RR ~0.78 per 1 mmol/L (~39 mg/dL) LDL reduction for major vascular events
— In a low-risk patient (10-year ASCVD risk 5%), ARR ≈ 1.1%, NNT ≈ 91 over 10 years
— In high-risk patient (20% risk), ARR ≈ 4.4%, NNT ≈ 23
— Same RR, very different clinical decision
— Compares multiple interventions simultaneously using direct and indirect evidence
— Produces rankograms and SUCRA values to rank treatments
— Requires transitivity assumption (trials comparable across comparisons)
— Useful when head-to-head trials are scarce (e.g., comparing 6 biologics for RA)
— Pools raw patient-level data rather than aggregate study results
— Gold standard — allows true subgroup analyses, time-to-event reanalysis, and consistent adjustment
— Resource-intensive; uncommon
— Sequentially adds trials in chronological order
— Reveals when evidence first reached statistical significance — sometimes years before guidelines changed (e.g., streptokinase in MI)
Board pearl: When the stem describes "a network meta-analysis showed drug X had the highest SUCRA for response," do not blindly pick drug X. Check whether direct head-to-head trials exist, whether the transitivity assumption is plausible, and whether safety SUCRA is also favorable. NMAs rank efficacy, but the best-ranked drug may also rank worst for serious adverse events.

— Pools sensitivity, specificity, LR+, LR− across studies
— Uses bivariate or HSROC models to account for sens/spec correlation
— SROC curve plots sensitivity vs 1-specificity across studies
— QUADAS-2 assesses risk of bias (patient selection, index test, reference standard, flow/timing)
— Threshold variability across studies is the main heterogeneity source
— CHARMS checklist for data extraction
— PROBAST for risk of bias
— Beware of optimism bias in model performance without external validation
— Pools prevalence or incidence across studies (e.g., prevalence of post-COVID fatigue)
— Uses Freeman-Tukey or logit transformation to stabilize variance
— Highly susceptible to between-study heterogeneity in case definitions
— Continuously updated as new evidence emerges
— Used during COVID-19 for therapeutic guidance (WHO living guidelines)
— Solves the lag between evidence generation and guideline updates
— Reviews of systematic reviews on the same topic
— Useful when multiple MAs exist with conflicting conclusions
— Highlight methodologic differences explaining discrepancies
— Re-run the MA excluding high-RoB studies, industry-funded trials, or outliers
— Robust results survive sensitivity analyses; fragile results change direction or significance
Step 3 management: When ordering a diagnostic test based on a DTA meta-analysis, anchor to your patient's pretest probability — apply the pooled LR+ and LR− via Fagan nomogram or Bayesian update. A test with pooled sensitivity 95% and specificity 90% is useless if pretest probability is 1% (post-test probability still <10%) and confirmatory if pretest probability is already 60%. Pretest probability is the lever; the test merely turns it.

— Prespecified subgroups in the protocol are credible; post hoc subgroups are hypothesis-generating only
— Look for the p-value for interaction, not just subgroup-specific p-values
— Multiple testing across many subgroups inflates false-positive risk (Bonferroni or similar correction)
— RCTs often exclude patients >75 or those with multimorbidity
— MAs inherit this exclusion; pooled effects may not apply to geriatric patients
— Look for sensitivity analyses or dedicated geriatric MAs
— Most pivotal trials exclude eGFR <30 or Child-Pugh B/C
— Pooled efficacy/safety data are sparse for these populations
— Use pharmacokinetic studies and observational registries to supplement
— MAs typically address single-disease outcomes; trade-offs across competing risks are underexplored
— Apply time-to-benefit considerations — does the patient have life expectancy long enough to realize the pooled benefit?
— Example: statin NNT of 50 over 5 years is irrelevant in a patient with 2-year life expectancy
— Average pooled effect may mask substantial individual variation
— Risk-stratified subgroup analyses (e.g., by baseline ASCVD risk) more clinically useful than overall pooled RR
— Emerging tools: predictive HTE models, machine learning on IPD-MA
Key distinction: A subgroup difference that is statistically significant on the interaction test (p_interaction < 0.05) is credible; a subgroup where the effect is significant in one stratum (p < 0.05) but not the other (p = 0.08) with no interaction test is almost always spurious. Step 3 will test this — do not conclude that "drug X works in men but not women" from non-overlapping subgroup CIs alone.

— Systematically excluded from most RCTs (historical and regulatory reasons)
— MAs of pregnancy-specific interventions (aspirin for preeclampsia, magnesium for neuroprotection) rely on dedicated obstetric trials
— Observational MAs with confounding adjustment often the best available evidence
— Cochrane Pregnancy and Childbirth Group is a high-quality source
— Age-stratified subgroups essential (neonates, infants, children, adolescents have distinct physiology)
— Extrapolation from adult MAs is hazardous — dose-response and adverse event profiles differ
— PRISMA-Children extension addresses pediatric-specific reporting
— Historically underreported; PRISMA 2020 emphasizes demographic transparency
— Pooled effects may not generalize if trial populations are demographically narrow
— Health equity considered explicitly in PRISMA-Equity extension
— Few small trials → MA may be the only quantitative synthesis available
— Bayesian methods with informative priors useful when frequentist pooling is unstable
— Single-arm trial pooling with external controls is increasingly common but bias-prone
— Most pivotal trials conducted in high-income countries
— Baseline risk, comorbidity burden, and access differ — affecting absolute benefit calculations
Board pearl: When counseling a pregnant patient based on an MA, ask three questions — (1) Were pregnant patients included in any included trials? (2) Is the outcome fetal, maternal, or both? (3) Are the alternative options to the intervention also evidence-based or merely traditional? "No evidence of harm" in a pregnancy MA often reflects no evidence at all, not evidence of safety — a critical distinction for informed consent.

— Pooling biased studies amplifies bias with false precision
— Cochrane reviews often conclude "low-quality evidence" precisely because included trials are flawed
— Pooled direction of effect can reverse subgroup-level findings if confounders differ across studies
— More common in observational-study MAs
— Study-level associations (e.g., mean BMI of cohort vs outcome rate) do not imply individual-level relationships
— Smaller trials often show larger effects (publication bias, methodologic differences)
— Funnel plot asymmetry and Egger's test detect this
— Trim-and-fill imputes "missing" studies but is itself imperfect
— Trials may report only favorable outcomes; the MA inherits this selection
— Trial registry comparison (ClinicalTrials.gov vs published paper) detects discrepancies
— Positive trials published faster than negative ones — early MAs over-estimate effects
— Positive English-language trials cited and translated more, inflating pooled estimates if non-English literature ignored
— Several MAs of the same therapy can reach conflicting conclusions — umbrella reviews clarify
— Authors emphasize favorable findings; reading only the abstract misleads
— Always check the forest plot, I², and risk-of-bias summary
— Tight pooled CI feels authoritative; doesn't fix underlying bias
Step 3 management: If a guideline panel issues a strong recommendation based on a single MA, check three things before adopting — (1) Cochrane or independent replication? (2) Risk-of-bias summary of included trials? (3) Industry funding of the MA itself? Strong recommendations built on a single industry-funded MA with high-RoB trials are common sources of guideline reversals (HRT, perioperative beta-blockade, intensive glycemic control).

— High I² (>75%) without credible explanation
— Pooled CI just barely excluding 1.0 driven by one dominant trial
— Funnel plot asymmetry suggesting publication bias
— Most included trials rated high or unclear RoB
— GRADE rating of low or very low certainty
— Surrogate primary outcome without patient-important confirmation
— Mega-trial with low RoB, broad population, and patient-important outcomes
— Example: ISIS-2 (aspirin in MI), RALES (spironolactone in HFrEF), SPRINT (intensive BP) — each redirected practice despite prior MAs
— A well-conducted mega-trial provides direct evidence; MA of small trials provides indirect synthesis
— New trials substantially increase the pooled sample size or event count
— New trials in previously underrepresented populations (women, elderly, non-white)
— Methodologic advances (NMA, IPD-MA) become feasible
— Practice has shifted such that the comparator is obsolete
— Continuously updated; ideal for fast-moving fields (oncology, infectious disease)
— WHO COVID-19 therapeutics guideline is the model
— Sometimes new evidence shifts a strong recommendation to "do not do" (perioperative beta-blockade in low-risk surgery, routine PSA screening)
— Be prepared to un-prescribe, not just prescribe
CCS pearl: On a Step 3 CCS case, if a stem references a guideline you don't recall, the safest move is the conservative, patient-centered choice — confirm diagnosis, address modifiable risk factors, shared decision-making, and follow-up. The exam rarely rewards aggressive treatment based on weak evidence; it rewards thoughtful application of high-certainty recommendations and acknowledgment of uncertainty where it exists.

— Author-curated summary, no systematic search
— Useful for pathophysiology, history, expert framing
— Not appropriate for therapeutic recommendations
— Maps the breadth of literature on a topic without quality appraisal or pooling
— Answers "what evidence exists?" rather than "what does it show?"
— PRISMA-ScR extension governs reporting
— Streamlined SR with methodologic shortcuts (single reviewer, limited databases)
— Used for urgent policy decisions
— Trades rigor for timeliness; explicit about limitations
— Review of systematic reviews on overlapping questions
— Reconciles conflicting MAs
— Multiple-treatment comparisons via direct + indirect evidence
— Patient-level pooling — gold standard for subgroup and time-to-event analyses
— Asks "what works, for whom, under what circumstances?" — common in implementation science
— Integrates quantitative and qualitative evidence
— Combines raw data from a small number of studies, often by the original investigators
— Sometimes conflated with IPD-MA but typically less systematic
— Synthesize MAs plus expert judgment, values, resources
— GRADE is the dominant framework
— AGREE-II assesses guideline quality
Key distinction: A systematic review answers "what does the evidence say?" with reproducible methods. A clinical practice guideline answers "what should we do?" using systematic reviews plus value judgments about benefits, harms, costs, and patient preferences. On the exam, guidelines may diverge across societies (USPSTF vs ACS for cancer screening) — the divergence reflects different value weightings, not different underlying evidence.

— SR/MA of RCTs
— Individual RCT
— Cohort study
— Case-control study
— Cross-sectional / case series
— Expert opinion
— A biased MA of small trials ranks lower than a well-conducted single mega-trial
— A registry-based cohort of 200,000 patients with rigorous adjustment may inform practice better than a tiny underpowered RCT
— GRADE replaces the rigid pyramid with a flexible framework — start with RCTs as high, observational as low, then up- or down-grade
— Pathophysiologic reasoning ("ACEi should help HFrEF because…") generated hypotheses; RCTs/MAs confirmed
— Mechanism alone has misled (CAST: arrhythmia suppression killed patients; HRT: prevented bone loss, increased CV events)
— Single-patient randomized crossover; useful for chronic stable conditions
— Bottom of population-level pyramid but top of personalized evidence for that individual
— EHR-based, claims-based, registry-based studies
— Increasingly accepted by FDA for label expansions
— Complements but does not replace RCTs/MAs
— Animal models, mechanistic studies
— Necessary for hypothesis generation, insufficient for clinical recommendations
Board pearl: When the exam offers "expert consensus statement," "case series of 12 patients," "registry analysis of 50,000 patients," and "Cochrane meta-analysis of 15 RCTs" as options for "best evidence to guide management," the Cochrane MA almost always wins unless the stem explicitly mentions high I², industry funding, or high risk of bias — in which case a single high-quality mega-trial may be the right answer. Read the stem's qualifiers carefully.

— Identify therapies with high-certainty evidence and meaningful absolute benefit for the patient's risk profile
— Combine with guideline-directed targets (LDL, BP, HbA1c, etc.)
— Document shared decision-making, especially when CI is wide or benefit modest
— Post-MI: aspirin + P2Y12 inhibitor, high-intensity statin, beta-blocker, ACEi/ARB, MRA if EF ≤40% (each backed by MAs showing mortality or MACE reduction)
— HFrEF: ARNI > ACEi (PARADIGM-HF + MAs), beta-blocker, MRA, SGLT2 inhibitor (DAPA-HF, EMPEROR-Reduced + MA)
— Stroke secondary prevention: antiplatelet, statin, BP control (MAs of PROGRESS, SPARCL, etc.)
— Each MA-supported drug adds benefit but also adherence burden, cost, interaction risk
— Periodic medication review (Beers, STOPP/START in elderly)
— De-prescribing when evidence does not apply (limited life expectancy, conflicting goals of care)
— Pooled estimates of recurrence/progression inform follow-up cadence
— Example: pooled HCC recurrence rates after curative resection inform imaging interval
— Influenza vaccine post-MI: MA shows reduced cardiac events
— Cardiac rehab: MA-confirmed mortality reduction post-MI/post-CABG
Step 3 management: Build the post-discharge plan as a stack of MA-supported interventions ranked by absolute benefit for this patient, with explicit follow-up to reinforce adherence — a 2-week post-discharge visit (medication reconciliation, side effect assessment), 3-month labs (lipids, renal function, K+ if on MRA/ACEi), and 6–12 month risk reassessment. Step 3 rewards specifying both what and when, anchored to evidence.

— Read the abstract for the question and headline result
— Jump to the forest plot and I²
— Check the risk-of-bias summary (often a "traffic light" figure)
— Read funnel plot or Egger's test for publication bias
— Note GRADE rating and conflicts of interest
— Only then read discussion — to see if authors' framing matches the data
— Did the review address a focused, clinically sensible question?
— Was the search comprehensive and unbiased?
— Were the included studies of adequate quality?
— Were the results consistent across studies?
— How precise is the pooled estimate?
— Can I apply the results to my patient?
— Were all patient-important outcomes considered?
— Do the benefits outweigh harms and costs?
— Cochrane Library: gold-standard SRs
— PROSPERO: protocol registry
— Epistemonikos, TRIP database: pre-appraised evidence
— GRADEpro, MAGICapp: guideline production tools
— DynaMed, UpToDate: synthesized point-of-care evidence
— Use absolute numbers, pictographs, and decision aids
— Avoid relative risk in isolation
— Acknowledge uncertainty ("the best evidence suggests, but we are not certain")
— Reassess at follow-up as new evidence emerges
— Subscribe to evidence digests (NEJM Journal Watch, BMJ Evidence-Based Medicine)
— Attend journal clubs; teach trainees to appraise
Board pearl: Critical appraisal is not a one-time skill; it is a continuous habit. The most clinically dangerous physician is the one who stopped reading primary literature in residency and now practices from memory of MAs that have since been overturned.

— Misrepresenting MA results (relative risk without absolute risk, surrogate outcomes as if patient-important) undermines informed consent
— Patients have a right to uncertainty disclosure — "the evidence is moderate quality, we estimate a 1.8% absolute benefit, harms include…"
— Decision aids based on synthesized evidence improve shared decision-making
— ICMJE disclosure required for authors of MAs
— Industry-funded MAs more likely to favor sponsor's product (sponsorship bias)
— Disclose your own COIs to patients when recommending therapies
— Retracted trials sometimes remain in MAs; check Retraction Watch
— Fraudulent trials (e.g., several anesthesia and probiotic scandals) have distorted MAs; recalculation after exclusion sometimes reverses conclusions
— Authors have an ethical duty to update reviews when retractions occur
— Excluding non-English studies, LMIC populations, women, elderly, racial minorities perpetuates evidence inequity
— PRISMA-Equity extension prompts authors to consider distributional effects
— Should include methodologists, clinicians, and patient representatives
— Industry-conflicted panel members should be recused from voting on related recommendations
— When a patient is discharged on a regimen supported by an MA, the discharge summary must clearly communicate (1) the evidence-based indication, (2) monitoring parameters, (3) the responsible follow-up clinician, and (4) explicit medication reconciliation
— Failure to communicate why a new MA-supported drug was started is a leading cause of post-discharge medication errors and unnecessary discontinuation by outpatient providers
— Document shared decision-making when starting a therapy with marginal absolute benefit (NNT >50) — this is both ethically required and medicolegally protective
Key distinction: Evidence-based medicine integrates best research evidence (MAs), clinical expertise, and patient values — not "the MA says so, therefore do it." Ignoring patient values is a form of paternalism; ignoring evidence is a form of negligence. Both are ethical failures.

— <25% low, 25–50% moderate, 50–75% substantial, >75% considerable
Board pearl: Memorize the four reporting/appraisal acronyms — PRISMA (reporting SRs), CONSORT (reporting RCTs), STROBE (reporting observational studies), GRADE (rating evidence certainty). Exam stems often hide the answer in which acronym applies.

— Stem shows a forest plot of 8 trials evaluating drug X vs placebo for outcome Y, pooled RR 0.82 (95% CI 0.71–0.95), I² = 35%
— Question: "What is the most appropriate interpretation?"
— Answer: statistically significant reduction with low-moderate heterogeneity; consider applicability and absolute benefit before adopting
— Stem shows asymmetric funnel plot with missing small negative studies
— Question: "What does this most likely indicate?"
— Answer: publication bias; pooled estimate likely overestimates effect
— I² = 82%, p for Q < 0.001
— Question: "What is the next best step?"
— Answer: explore sources via subgroup analysis or meta-regression; reconsider whether pooling is appropriate; do not simply use random-effects and call it done
— Pooled RR 0.65 for stroke, baseline 5-year risk 8%
— Compute ARR = 0.08 × 0.35 = 2.8%, NNT = 36
— Outcome occurs in 40% of one group, stem reports OR 2.0 as if RR
— Question tests recognition that OR overestimates RR for common outcomes
— MA shows overall benefit p=0.02; subgroup of women p=0.04, men p=0.21
— Question: "Should you only treat women?" Answer: no — check p for interaction; subgroup difference likely spurious
— MA shows drug reduces HbA1c but no MACE reduction
— Question tests recognition that surrogate benefit ≠ patient-important benefit
— Two MAs reach opposite conclusions on the same question
— Answer: assess methodologic quality (AMSTAR-2), search comprehensiveness, RoB of included studies; favor Cochrane or independent over industry-funded
— New mega-RCT contradicts older MA of small trials
— Answer: usually the larger, less biased trial wins
Step 3 management: On every appraisal question, your sequence is — direction → significance → heterogeneity → bias → applicability → absolute benefit. If you internalize this six-step scan, you will answer most MA stems correctly without memorizing trial names.

A systematic review and meta-analysis is only as trustworthy as its weakest component — the question, the search, the included studies, and the synthesis methods — and your job as a clinician is to translate its pooled relative effects into absolute benefits and harms for the patient sitting in front of you.

