top of page

Eduovisual

Biostatistics & Population Health

Network meta-analysis and indirect comparisons

Clinical Overview and When to Suspect Need for Network Meta-Analysis

Direct evidence: Trials directly comparing A vs B

Indirect evidence: Inferring A vs C using A vs B and B vs C trials, anchored through the common comparator B

Mixed evidence: Pooled direct + indirect estimate (the defining feature of NMA)

— Stem mentions "indirect comparison," "common comparator," "ranking probabilities," "SUCRA," or "league table"

— A new drug was approved without head-to-head trials vs the current standard

— A guideline committee must rank multiple therapies for a value-based formulary decision

— Forest plot shows >2 treatments compared to placebo with derived A vs B estimates

Definition: Network meta-analysis (NMA), also called multiple treatments meta-analysis (MTM) or mixed treatment comparison (MTC), synthesizes evidence from a network of randomized trials to compare ≥3 interventions simultaneously, even when some have never been directly compared head-to-head.
Why it exists: Traditional pairwise meta-analysis can only compare two treatments at a time. In modern therapeutics (e.g., 8 biologics for rheumatoid arthritis, 5 DOACs vs warfarin, multiple SGLT2/GLP-1 agents), head-to-head RCTs are rarely complete across all pairs.
Core mechanic:
When to suspect a question is testing NMA concepts on Step 3:
Step 3 management: When asked to interpret an NMA for a clinical or formulary decision, first verify (1) the network is connected (every treatment links to every other via some path), (2) transitivity holds (trials are clinically similar in patients, dose, outcome), and (3) consistency between direct and indirect estimates is statistically tested.
Board pearl: NMA does NOT generate new randomized data — it borrows strength across the randomized network. Treatments compared only indirectly retain the protection of randomization within each trial but not between trials, so confounding by trial-level effect modifiers is the Achilles' heel.
Frequently appears in USPSTF updates, ACC/AHA drug-class recommendations, and Cochrane reviews — recognize the methodology so you can grade its certainty.
Solid White Background
Presentation Patterns and Key History of an NMA

— A figure shows a network diagram: nodes = treatments, edges = head-to-head trials, edge thickness = number of trials or patients

— A league table displays all pairwise comparisons (often odds ratios or hazard ratios with 95% credible intervals) in a triangular matrix

SUCRA values or rankograms order treatments from best to worst on each outcome

— Were trials selected with a prespecified PICO and registered protocol (PROSPERO)?

— Is the comparator network closed (loops present, allowing consistency checks) or star-shaped (all trials vs a single comparator like placebo — no loops, no consistency testing possible)?

Frequentist vs Bayesian framework: Bayesian NMAs report credible intervals and posterior probabilities of being best; frequentist NMAs report confidence intervals

— Was heterogeneity (τ²) reported across the network, and was inconsistency formally tested (node-splitting, design-by-treatment interaction)?

— Only one trial supports a key edge → fragile indirect estimate

— Older trials (1990s) mixed with modern trials → transitivity violation from changes in background therapy

— Different outcome definitions across trials (e.g., ACR20 vs ACR50)

— Industry-sponsored trials concentrated on one node

Typical "presentation" in a journal or board stem:
Key "history" elements to extract from the methods section:
Red flags in the "history":
Key distinction: A pairwise meta-analysis pools only direct evidence; an NMA additionally synthesizes indirect and mixed evidence. If you see only A vs B forest plot, it is not an NMA even if many trials are pooled.
Step 3 management: Before applying NMA conclusions to your patient, ask whether your patient resembles the average trial population in the network. If your patient has CKD stage 4 and the network excluded eGFR <30, the SUCRA ranking does not translate.
Recognize that "ranking" is probabilistic, not deterministic — a drug ranked #1 with 35% probability is barely ahead of #2 at 30%.
Solid White Background
Methodological Exam Findings — Assessing Network Geometry and Assumptions

Connectedness: Every treatment must connect through some chain of comparisons; isolated nodes cannot be compared

Density: Ratio of observed edges to possible edges; sparse networks yield wide credible intervals

Loops/closed triangles: Required to test consistency; star networks (all vs placebo) cannot test consistency at all

Co-occurrence patterns: Are certain treatments only compared with industry-favored comparators? (publication/comparator bias)

Homogeneity (within pairwise comparisons): trials comparing A vs B should be similar enough to pool. Assessed by I² and τ².

Transitivity (across comparisons): trials comparing A vs B must be clinically similar to trials comparing B vs C with respect to effect modifiers (age, severity, dose, follow-up, background therapy). This is a clinical judgment, not a statistical test.

Consistency (statistical manifestation of transitivity): direct and indirect estimates of the same comparison should agree. Tested by node-splitting, loop-specific inconsistency, or design-by-treatment interaction model.

— A network is "unstable" if a single trial dominates a key edge, if loops show inconsistency (p<0.05), or if τ² is large across the network

— Sensitivity analyses removing outlier trials should not change rankings substantially

Network geometry inspection (the "physical exam" of NMA):
The three foundational assumptions — must all be assessed:
"Hemodynamic" stability of the network:
Board pearl: Transitivity is violated most often by background-therapy drift over decades. A 1995 trial of drug A vs placebo and a 2020 trial of drug B vs placebo may not be transitively combinable because standard of care changed — even though both used "placebo."
Key distinction: Heterogeneity is variability within a pairwise comparison; inconsistency is disagreement between direct and indirect estimates of the same pairwise comparison. Both must be checked.
Step 3 management: When a guideline cites an NMA, look for a statement that transitivity was assessed and inconsistency tested. Absence of either is a methodologic weakness that should lower your confidence (downgrade in GRADE).
Solid White Background
Diagnostic Workup — Initial Statistical Tools and Outputs to Interpret

Network plot: Visualizes node sizes (sample size per treatment) and edge weights (number of trials per comparison)

Forest plot of all comparisons vs reference: Each treatment plotted against a common reference (often placebo), with pooled effect estimates from the NMA model

League table: Triangular matrix showing every pairwise comparison; read row vs column. The diagonal contains treatment names; off-diagonal cells contain OR/RR/HR with 95% CI or CrI.

Rankogram: Probability of each rank (1st, 2nd, 3rd…) for each treatment across MCMC samples

SUCRA (Surface Under the Cumulative Ranking curve): Single number from 0 to 1 (or 0–100%) summarizing rank performance; 1 = always best, 0 = always worst

— Binary outcomes: odds ratio or risk ratio

— Continuous outcomes: mean difference or standardized mean difference

— Time-to-event: hazard ratio (requires log-HR and SE from each trial)

Frequentist NMA (e.g., `netmeta` in R): produces point estimates, 95% CIs, p-values for inconsistency

Bayesian NMA (e.g., WinBUGS, `gemtc`): produces posterior distributions, 95% credible intervals, direct probability statements ("probability drug A is best = 0.72")

The core outputs you must be able to read on Step 3:
Effect measure conventions:
Statistical frameworks:
Initial certainty assessment: GRADE has been extended to NMA via CINeMA (Confidence in Network Meta-Analysis), which evaluates within-study bias, reporting bias, indirectness, imprecision, heterogeneity, and incoherence.
Board pearl: A SUCRA of 85% does not mean 85% effective — it means the drug is consistently ranked near the top across the outcome's posterior distribution. Always pair SUCRA with the absolute effect size and credible interval.
Step 3 management: When choosing among ranked therapies, weight SUCRA against clinically meaningful effect size, safety SUCRA on a separate outcome, cost, and patient-specific factors — never rank alone.
Solid White Background
Diagnostic Workup — Advanced Analyses and Confirmatory Tests
Node-splitting (Dias method): For each closed loop in the network, "split" a node to compare direct vs indirect evidence for that comparison. A statistically significant difference (p<0.05) flags local inconsistency.
Design-by-treatment interaction model (Higgins/White): Tests global inconsistency across the entire network simultaneously by including a design effect for each multi-arm comparison.
Loop-specific inconsistency: Calculates inconsistency factor (IF) for each closed loop; IF with 95% CI excluding zero suggests inconsistency in that loop.
Component NMA: Used when interventions are combinations (e.g., A+B, A+C, B+C); decomposes effects into additive components — relevant for behavioral, surgical, or multi-drug regimens.
Network meta-regression: Adjusts for trial-level covariates (mean age, baseline risk, dose) to address transitivity violations and explain heterogeneity. Caution: ecological fallacy if individual-level inference is drawn.
Individual patient data (IPD) NMA: Gold standard — uses raw patient data rather than trial-level summary statistics, enabling subgroup analysis and reducing ecological bias. Rarely feasible due to data-sharing limits.
Comparison-adjusted funnel plot: Detects small-study effects/publication bias across a network, with comparisons reordered so newer treatments are on one side.
Threshold analysis: Quantifies how much the evidence base would need to change before rankings flip — a measure of robustness.
Key distinction: Node-splitting = local (one comparison at a time); design-by-treatment interaction = global (whole network). A consistent network can still have one inconsistent loop, so both should be reported.
Board pearl: Adjusted indirect comparison (Bucher method) is the simplest indirect comparison: log(OR_AC) = log(OR_AB) − log(OR_BC), with variances summed. It is a special case of NMA limited to a single triangle and assumes transitivity. Step 3 stems may show this calculation explicitly.
Step 3 management: When an NMA reports significant inconsistency, do not use the pooled estimate — revert to direct evidence only, or downgrade certainty substantially in GRADE/CINeMA.
Solid White Background
Risk Stratification — Grading Certainty and First-Line Interpretation Logic

Within-study bias: Cochrane RoB 2 applied to contributing trials, weighted by their contribution to each comparison

Reporting bias: Comparison-adjusted funnel plot, search comprehensiveness

Indirectness: Do the trial populations/interventions match the review question?

Imprecision: Credible/confidence interval width relative to a clinically important threshold

Heterogeneity: τ² compared to empirical predictive distributions

Incoherence: Statistical test of consistency (node-splitting or global)

— High SUCRA + narrow CrI + high CINeMA confidence → actionable ranking

— High SUCRA + wide CrI → ranking is unstable; do not over-interpret

— Multiple treatments with overlapping CrIs → effectively tied; choose based on safety, cost, patient preference

— Step 1: Is the network connected and is the question's comparison in the network?

— Step 2: What is the point estimate and 95% CrI for the relevant comparison?

— Step 3: Is the effect clinically meaningful (exceeds MCID)?

— Step 4: What is the CINeMA confidence for that specific comparison?

— Step 5: Apply to patient considering effect modifiers

CINeMA framework (Confidence in Network Meta-Analysis) — six domains:
Each domain rated: No concerns / Some concerns / Major concerns → overall certainty: High / Moderate / Low / Very Low
Interpreting rankings under uncertainty:
First-line interpretation algorithm for a Step 3 stem:
Step 3 management: Formulary committees use NMA rankings as one input alongside cost-effectiveness (ICER), safety, adherence, and equity. A drug ranked #1 by efficacy SUCRA but with a $200,000/QALY ICER may not be first-line.
Board pearl: Beware "ranking obsession." Two drugs with SUCRA 0.78 and 0.72 are statistically indistinguishable. The exam tests whether you recognize overlapping credible intervals override numeric ranking.
Key distinction: GRADE for pairwise meta-analysis starts at "High" for RCTs; CINeMA does the same but applies separately to every pairwise comparison in the network — certainty varies across the matrix.
Solid White Background
Pharmacotherapy Application — Using NMA to Choose a Drug Class

DOACs for AF: Apixaban, rivaroxaban, dabigatran, edoxaban — only some head-to-head trials exist; NMA informs comparative efficacy/bleeding

Biologics for RA/IBD/psoriasis: Multiple TNF inhibitors, IL-17, IL-23, JAK inhibitors; NMAs rank by ACR/PASI/clinical remission

SGLT2 vs GLP-1 for T2DM cardiorenal outcomes: Cross-class NMAs guide ADA/KDIGO recommendations

Antidepressants: The Cipriani 2018 Lancet NMA of 21 antidepressants is the canonical example

Antihypertensives: ALLHAT-era and post-hoc NMAs informing thiazide vs ACEi vs CCB first-line

— Identify the efficacy outcome SUCRA AND the safety/tolerability SUCRA — best efficacy drug may have worst discontinuation rate

— Apply patient-specific modifiers: renal function, drug interactions, cost/coverage, pregnancy plans

— Consider acceptability (often modeled as all-cause discontinuation in psychiatry NMAs)

— NMAs of HbA1c reduction may rank differently than NMAs of MACE or mortality

— Always check which outcome the ranking applies to

— NMAs often pool multiple doses per drug; if a single dose is non-standard, transitivity is shaky

Classic Step 3 scenarios where NMA drives drug choice:
How to translate an NMA into a prescription:
Pitfall — surrogate vs hard outcomes:
Pitfall — dose equivalence:
Board pearl: The Cipriani antidepressant NMA ranked amitriptyline highest in efficacy but lowest in acceptability; escitalopram and sertraline had the best efficacy/acceptability balance — informing why SSRIs remain first-line despite tricyclics' raw efficacy.
Step 3 management: For an outpatient drug choice, present the NMA-informed top 2–3 options to the patient with their efficacy/safety tradeoffs (shared decision-making); do not robotically pick SUCRA #1. Document the discussion and reasoning.
Key distinction: NMA ranks average patients in the network — your individual patient with CKD, polypharmacy, or pregnancy may flip the optimal choice entirely.
Solid White Background
Expanded Methodology — Bucher Indirect Comparison and Multi-Arm Trial Handling

— Setup: A vs B trial yields log(OR_AB) ± SE_AB; B vs C trial yields log(OR_BC) ± SE_BC

— Indirect estimate: log(OR_AC) = log(OR_AB) − log(OR_BC)

— Variance: Var(log OR_AC) = Var(log OR_AB) + Var(log OR_BC) (variances add because estimates are independent)

— SE: square root of summed variances

— 95% CI: log(OR_AC) ± 1.96 × SE

— Exponentiate to get OR_AC and its CI

— Trials with ≥3 arms (e.g., A vs B vs C) provide direct evidence on multiple comparisons but the estimates are correlated (share a control group)

— NMA models must account for this correlation — failure to do so underestimates variance and produces falsely narrow CrIs

— Methods: multivariate normal likelihood (Bayesian) or augmented variance-covariance matrix (frequentist)

Anchored: A common comparator exists (e.g., placebo) — preserves randomization → preferred

Unanchored: No common comparator (single-arm trials) — must adjust for all prognostic factors and effect modifiers; very high risk of bias (MAIC, STC methods)

The Bucher adjusted indirect comparison (must know for Step 3):
Why this matters: The Bucher method preserves randomization within each trial but assumes the trials are exchangeable on effect modifiers (transitivity).
Worked logic: If drug A reduces mortality vs placebo (OR 0.70) and drug B reduces mortality vs placebo (OR 0.85), the indirect A vs B estimate is OR 0.70/0.85 = 0.82 — drug A is better than B by ~18% odds reduction, but the CI is wider than either direct estimate because variances compound.
Multi-arm trial handling:
Anchored vs unanchored indirect comparisons:
Board pearl: Unanchored indirect comparisons are observational in nature, despite using RCT data, because they break the randomization protection. Regulators (NICE, FDA) accept them only when anchored comparison is impossible.
Step 3 management: When evaluating a manufacturer-submitted indirect comparison, immediately ask: (1) Is it anchored? (2) Are effect modifiers balanced? (3) Was transitivity justified clinically, not just statistically?
Solid White Background
Special Populations — Elderly, Renal/Hepatic Impairment, and Underrepresented Groups

— Pharmacokinetic differences (reduced clearance) shift efficacy/toxicity balance

— Competing risks of mortality blunt long-term outcome differences

— Polypharmacy interactions absent from trial populations

— NMA SUCRA rankings for bleeding (DOACs) often diverge in age >75 — apixaban tends to retain favorability

— Most cardiovascular NMAs exclude eGFR <30; SGLT2/GLP-1 cardiorenal NMAs increasingly stratify by baseline eGFR

— Dabigatran: heavily renally cleared; bleeding risk rises disproportionately in CKD — NMA rankings invert in this subgroup

— Trials typically exclude Child-Pugh B/C; NMA conclusions about hepatically metabolized drugs (e.g., rivaroxaban, statins) cannot be extrapolated

— Stratify the network by age, sex, baseline severity, or geography

— Requires sufficient trials within each stratum — often underpowered

— IPD NMA is the only rigorous solution but is rare

Core problem: Most NMAs aggregate trial-level data where older adults, CKD stages 4–5, hepatic impairment, and frailty patients are systematically excluded. Rankings derived from a 55-year-old trial population may not apply to your 82-year-old patient with eGFR 28.
Effect modifier concerns in elderly:
Renal impairment:
Hepatic impairment:
Subgroup NMA / network meta-regression:
Reporting bias amplifier: Underrepresented populations (Black, Hispanic, Indigenous patients; women in cardiovascular trials) are underrepresented in source RCTs, so NMA rankings carry their trial enrollment biases forward and amplify them via indirect comparisons.
Board pearl: When a Step 3 stem features an 80-year-old with CKD and an NMA-based recommendation, the correct answer often emphasizes clinical judgment + caution about extrapolation rather than blindly applying the SUCRA #1 drug.
Step 3 management: Document in your note when an NMA-based guideline recommendation is being applied off the supporting evidence base (e.g., outside the age/renal range) and use shared decision-making with explicit uncertainty disclosure.
Key distinction: NMA improves precision for populations represented in trials — it does not create new evidence for excluded populations.
Solid White Background
Special Populations — Pregnancy, Pediatrics, and Rare Diseases

— Pregnant patients are systematically excluded from most therapeutic RCTs, so NMA networks for chronic conditions (RA, IBD, MS, depression) have minimal pregnancy data

— Recent pregnancy-specific NMAs exist for gestational diabetes treatment, postpartum hemorrhage prevention (oxytocin vs carbetocin vs misoprostol — WHO 2018 NMA), preeclampsia prophylaxis

— Indirect comparisons in pregnancy carry higher transitivity risk because gestational age, parity, and baseline risk vary widely

— Pediatric ADHD: methylphenidate vs amphetamines vs atomoxetine vs guanfacine — Cochrane NMAs guide first-line choice

— Pediatric asthma: ICS vs ICS+LABA vs LTRA NMAs inform stepwise GINA recommendations

— Pediatric trials are scarce; networks are sparse → wide CrIs and low CINeMA confidence

— Small trials, single-arm studies, and historical controls dominate

— Often require unanchored indirect comparisons (MAIC: matching-adjusted indirect comparison; STC: simulated treatment comparison)

— Regulators accept these only with extensive sensitivity analyses

MAIC: Reweights individual patient data from one trial to match aggregate characteristics of the comparator trial

STC: Fits a regression model in one trial and predicts outcomes in the comparator population

— Both reduce bias from unbalanced effect modifiers but require IPD from at least one trial

Pregnancy and NMA:
Pediatric NMAs:
Rare disease NMAs:
Population-adjusted indirect comparison (PAIC) methods:
Board pearl: WHO's 2018 NMA of uterotonics for postpartum hemorrhage prevention found heat-stable carbetocin and oxytocin similarly effective, supporting carbetocin in low-resource settings where cold chain is unreliable — an example of NMA driving global health policy.
Step 3 management: In pregnancy or pediatrics, when NMA evidence is weak (low/very-low CINeMA), default to mechanism + safety profile + registry data + specialist input rather than ranking algorithms.
Key distinction: Pediatric and pregnancy NMAs are often hypothesis-generating, not definitive — communicate this uncertainty.
Solid White Background
Complications and Adverse Outcomes of NMA Misuse

Transitivity violation: Combining trials with different background therapies, eras, or populations through a common comparator that is no longer "common" in effect

Ignored inconsistency: Reporting pooled estimates when node-splitting reveals significant direct/indirect disagreement

Selective comparator choice: Industry-sponsored NMAs that exclude inconvenient comparators or trials

Multiple-arm correlation ignored: Treating arms of a 3-arm trial as independent — falsely narrow CrIs

Overreliance on SUCRA: Reporting rankings without effect sizes or CrIs, creating false confidence

— Formulary decisions favoring SUCRA #1 drugs that are statistically indistinguishable from cheaper alternatives → wasted spend

— Guideline recommendations extrapolated to populations outside the network → harm in excluded subgroups

— Manufacturer marketing claims based on indirect comparisons inflating perceived superiority

— Pairwise meta-analysis publication bias propagates through indirect estimates and can distort multiple rankings simultaneously

— Comparison-adjusted funnel plots detect but rarely correct this

— Removing a single trial can flip rankings — robustness must be tested via leave-one-out sensitivity analysis

— Clinicians treat 95% CrI as if it were a confidence interval; Bayesian intervals depend on prior choice (informative vs vague priors can shift conclusions)

Major methodologic failures that produce misleading conclusions:
Clinical/policy consequences:
Publication bias amplification:
Small-network instability:
Misinterpretation of credible intervals:
Board pearl: A SUCRA of 1.0 (always ranked best) does not mean the drug is significantly better than all others — if CrIs overlap substantially, the ranking is artifactual. Always check the league table CrIs.
Step 3 management: When a colleague cites an NMA to justify a switch, ask three questions: (1) Direct or indirect? (2) Is the CrI clinically meaningful? (3) Was inconsistency tested? If any answer is "no" or "unknown," delay the change.
Key distinction: Statistical significance in NMA ≠ clinical significance; effect size + MCID + safety profile must all align.
Solid White Background
When to Escalate — Calling a Methodologist, Statistician, or Evidence Synthesis Expert

— You are writing a guideline or formulary policy and the NMA's CINeMA confidence is low/very-low for the relevant comparison

— Your patient population (pregnancy, dialysis, pediatric, geriatric with multimorbidity) is unrepresented in the network

— The NMA shows significant inconsistency but reports pooled estimates anyway

— Conflicting NMAs on the same question reach different rankings — common in psoriasis biologics, antidepressants

— A manufacturer submits an unanchored indirect comparison for regulatory or coverage purposes

Clinical epidemiologist or biostatistician with NMA experience

HTA agency methodologists (NICE, ICER) for coverage decisions

Cochrane editorial team for systematic review questions

GRADE/CINeMA working group resources for certainty grading

— Re-run sensitivity analyses with your population in mind

— Conduct meta-regression on key effect modifiers

— Update the network with newer trials

— Adjudicate conflicting NMAs by comparing inclusion criteria, network geometry, and statistical models

— Just as you escalate a deteriorating patient to ICU, escalate methodologic uncertainty to expert consultation rather than guessing

— Document the consultation and its impact on your decision

— Large health systems' P&T committees should have biostatistical input for NMA-based decisions

— Value-based contracts increasingly rely on NMA-derived comparative effectiveness — getting it wrong has financial and clinical consequences

Triggers to escalate beyond the published NMA:
Who to call:
What to ask them:
Inpatient/CCS analogy — escalation as a process:
Health system context:
Board pearl: When two well-conducted NMAs on the same topic disagree, the difference is usually driven by trial inclusion criteria (search dates, language, population) or comparator choice, not statistical methods. Compare PRISMA flow diagrams side-by-side.
CCS pearl: In a CCS-style population health case, "consult biostatistics" or "request a CINeMA appraisal" is the analog of "consult cardiology" — appropriate escalation, not weakness.
Step 3 management: Never let an NMA's complexity intimidate you into accepting it uncritically; ask for help when stakes are high.
Solid White Background
Key Differentials — Other Evidence Synthesis Methods (Same Category)

— Pools only direct evidence on a single comparison (A vs B)

— Simpler, less assumption-heavy, but cannot rank multiple treatments

— Use when you have a focused two-arm question with sufficient direct trials

— Simplest case of indirect comparison — single triangle (A vs B, B vs C → A vs C)

— Preserves randomization within trials; assumes transitivity

— Use when only two pairwise meta-analyses exist and you need to compare A vs C

— Older term for NMA, particularly in Bayesian framework

— Functionally equivalent to NMA

— For interventions that share components (combination therapies, complex behavioral interventions)

— Decomposes effects into additive component contributions

— Use in surgical bundles, multi-drug regimens, psychotherapy components

— Uses raw patient data from each trial

— Enables subgroup analysis, addresses ecological bias, handles missing data better

— Use when raw data are accessible (often via consortia)

— Continuously updated as new trials publish

— Used in fast-moving fields (COVID-19 therapeutics — WHO living NMA)

— Use when evidence base is rapidly evolving

— Adjusts for cross-trial differences in effect modifiers using IPD from one trial

— Use when anchored NMA is impossible or transitivity is violated

Pairwise (traditional) meta-analysis:
Adjusted indirect comparison (Bucher):
Mixed treatment comparison (MTC):
Component network meta-analysis:
Individual patient data (IPD) NMA:
Living network meta-analysis:
Population-adjusted indirect comparison (MAIC, STC):
Key distinction: Pairwise meta-analysis answers "Is A better than B?"; NMA answers "Among A, B, C, D, E, which is best, and how do they all compare?"
Board pearl: Cochrane Reviews increasingly use NMA when ≥3 interventions are clinically relevant — recognize Cochrane NMA badges in literature searches.
Step 3 management: Choose the synthesis method that matches the clinical question's complexity — do not use NMA when a focused pairwise meta-analysis suffices, and do not retreat to pairwise when multiple treatments must be ranked.
Solid White Background
Key Differentials — Non-Synthesis Evidence (Other Category)

— Highest internal validity for the specific comparison

— Limited to two (or few) arms; cannot generalize to untested treatments

— Use when a definitive A vs B trial exists and matches your patient

— Test multiple treatments concurrently within a single trial infrastructure

— Produce direct head-to-head comparisons that strengthen NMA networks

— Often the gold standard for resolving NMA inconsistencies

— Registry, claims, EHR data with target trial emulation, propensity scores

— Captures pregnancy, geriatric, multimorbid patients excluded from RCTs

— Subject to confounding by indication despite statistical adjustment

— Combines NMA efficacy/safety outputs with cost and utility data

— Produces ICERs (incremental cost-effectiveness ratios) for formulary/coverage decisions

— NMAs feed into the efficacy parameters of CEA models

— Synthesize multiple systematic reviews/meta-analyses on a topic

— Different from NMA — do not pool indirect evidence statistically

— Used when evidence is too sparse or conflicting for quantitative synthesis

— Lower in evidence hierarchy but important when NMAs cannot be performed

Single RCT (head-to-head):
Pragmatic/platform trials (e.g., RECOVERY, REMAP-CAP):
Real-world evidence (RWE) / observational comparative effectiveness:
Cost-effectiveness analysis (CEA) and HTA:
Umbrella reviews / overviews of systematic reviews:
Expert consensus / Delphi panels:
Key distinction: NMA is quantitative synthesis of randomized evidence; RWE is observational evidence. They answer overlapping but distinct questions — NMA tells you efficacy among trial-eligible patients; RWE tells you effectiveness in real practice.
Board pearl: Platform trials like RECOVERY in COVID-19 directly resolved comparative questions (dexamethasone vs tocilizumab vs baricitinib) that would otherwise require NMA — a methodologic upgrade over indirect comparison.
Step 3 management: For policy or coverage decisions, integrate NMA (efficacy ranking) + RWE (real-world safety/adherence) + CEA (value) + patient preference — no single method suffices alone.
Key distinction: GRADE certainty from an NMA can be higher than from RWE on the same question if RCTs are well-conducted, but lower when transitivity or consistency fails.
Solid White Background
Secondary Prevention — Applying NMA to Long-Term and Discharge Decisions

— For chronic disease (HF, DM, RA), NMA-informed first-line choice is just the start — durability, adherence, and long-term safety drive sustained benefit

— Switch decisions (e.g., second-line biologic after TNF failure) often rely on post-failure NMAs, which are sparser and less certain

— Post-MI: aspirin + P2Y12 inhibitor — NMA of ticagrelor vs prasugrel vs clopidogrel informs DAPT choice (ISAR-REACT 5 added direct evidence)

— Post-stroke: DOAC choice for AF-related cardioembolic stroke — NMAs guide apixaban preference in high-bleeding-risk patients

— Post-DVT/PE: anticoagulant choice — NMA supports DOACs over warfarin for most patients

— Cumulative toxicity (e.g., long-term steroid exposure, JAK inhibitor cardiovascular signal)

— Tachyphylaxis and durability (biologic immunogenicity)

— Adherence patterns (once-daily vs multiple daily dosing)

— Drug interactions accumulating over time

— Increasingly used for rapidly evolving fields (oncology, biologics, antivirals)

— Allow guideline updates without waiting for definitive head-to-head trials

— Communicate that the drug choice is based on average evidence, not individual prediction

— Frame in shared decision-making: efficacy, safety, cost, lifestyle fit

Translating NMA outputs into longitudinal care plans:
Discharge-medication examples informed by NMA:
Long-term considerations not captured by short-term efficacy NMAs:
Living NMAs in chronic care:
Counseling at discharge:
Board pearl: In chronic conditions with multiple effective therapies, second- and third-line NMA evidence is typically weaker than first-line because post-failure populations are heterogeneous, sequencing varies, and trials are smaller.
Step 3 management: At discharge, document the NMA-informed rationale, the alternatives discussed, and the planned follow-up to assess response — this supports both clinical care and medico-legal protection.
Key distinction: NMA optimizes the decision at one time point; longitudinal care requires reassessment as new evidence (and new trials) update the network.
Solid White Background
Follow-Up, Monitoring, and Patient Communication About NMA-Based Decisions

— Track the outcome the NMA prioritized (e.g., HbA1c, ACR50, PASI75, MACE) at evidence-based intervals

— Capture safety endpoints that may differ between agents (eGFR for SGLT2, lipid panel for JAK, LFTs for biologics)

— Reassess at intervals matched to trial follow-up — drugs ranked well at 12 weeks may not retain benefit at 52 weeks

— New definitive head-to-head RCT publishes (may overturn indirect-comparison-based rankings)

— Updated NMA changes the league table for your patient's profile

— Patient experiences ranked adverse event (switch within class)

— New comorbidity emerges that violates the transitivity of the original NMA's applicability

— Explain that "we chose this medication because, across many studies, it had the best balance of benefit and side effects for people similar to you"

— Acknowledge uncertainty: "Some of the comparisons were indirect — meaning we estimate them by combining studies — so there's some uncertainty"

— Discuss alternatives: SUCRA #2 and #3 are often clinically equivalent

— Guidelines citing NMAs change as networks expand; reassure patients that updates reflect better evidence, not previous error

— EHR clinical decision support increasingly references NMA-derived rankings

— Be aware of which NMA the CDS is built on and whether it matches current evidence

Monitoring NMA-informed therapy:
When to reconsider the choice:
Patient communication strategies:
Counseling on guideline updates:
Health-system integration:
Board pearl: Patient adherence is the largest effect modifier outside the network. A SUCRA #2 drug taken consistently outperforms SUCRA #1 taken sporadically.
Step 3 management: Schedule explicit follow-up to assess both efficacy and tolerability of NMA-informed choices; document patient-reported outcomes alongside lab/clinical metrics.
CCS pearl: In a longitudinal CCS scenario, "schedule follow-up in 6–12 weeks" with appropriate labs is often the next step after starting an NMA-informed first-line therapy — match monitoring intervals to evidence base.
Key distinction: Initial drug choice is a statistical inference; ongoing monitoring is a clinical inference about your patient.
Solid White Background
Ethical, Legal, and Patient Safety Considerations in NMA-Based Care

— When recommending a therapy whose advantage over alternatives rests on indirect comparison, ethical practice supports disclosing this uncertainty during shared decision-making

— Example: Recommending apixaban over rivaroxaban based on NMA (no head-to-head RCT) — patient should understand the comparative evidence is indirect

— Failure to disclose material uncertainty can compromise consent quality

— Industry-funded NMAs may selectively choose comparators or trials favoring sponsor products

— Guideline panel members with relevant COI should recuse from NMA-based recommendations (per IOM/NAM standards)

— Disclose COI when communicating NMA-based recommendations to patients

— Excluded populations (pregnancy, pediatrics, racial/ethnic minorities, elderly with multimorbidity) inherit the bias of trial enrollment when NMA conclusions are extrapolated

— Health systems applying NMA-based formulary decisions without subgroup considerations may exacerbate disparities

— When patients move between systems with different NMA-based formularies, therapeutic substitutions within class occur — most are safe (CrIs overlap) but some are not (e.g., dabigatran vs apixaban in CKD)

— Medication reconciliation must consider why the original drug was chosen

— Systematic reviews and NMAs should be prospectively registered (PROSPERO) and report per PRISMA-NMA

— Publication bias undermines NMA validity; FDAAA and EU clinical trial registry mandates support transparency

— Acting on an NMA whose inconsistency was not tested, or whose CrIs were ignored, is analogous to acting on an uncalibrated diagnostic test — a latent safety hazard in evidence-based practice

Informed consent and indirect evidence:
Conflicts of interest:
Equity and justice:
Transition-of-care risk:
Mandatory reporting and transparency:
Patient safety incident analog:
Board pearl: Always disclose to patients when an FDA-approved comparative claim is based on indirect comparison rather than head-to-head trial — this is increasingly an FDA labeling and regulatory ethics issue.
Step 3 management: Document in your note the evidence basis (direct trial vs NMA), the discussion of uncertainty, and the patient's preferences — this is both ethically sound and medico-legally protective.
Key distinction: Statistical certainty ≠ ethical sufficiency; high-quality consent requires translating CrIs into plain language.
Solid White Background
High-Yield Associations and Rapid-Fire Facts
Network meta-analysis = NMA = MTC = mixed treatment comparison — synonyms you must recognize
Bucher method: simplest indirect comparison; log(OR_AC) = log(OR_AB) − log(OR_BC); variances add
Transitivity: clinical/conceptual assumption that trials are exchangeable on effect modifiers
Consistency: statistical manifestation; tested by node-splitting or design-by-treatment interaction
Heterogeneity ≠ inconsistency: within-comparison variability vs between-direct-and-indirect disagreement
SUCRA = Surface Under Cumulative Ranking curve; 0 to 1 (or 0–100%); summarizes ranking probability
Rankogram: probability of each rank for each treatment
League table: triangular matrix of all pairwise comparisons
Star network: all comparisons share a single common comparator (often placebo); cannot test consistency
Closed-loop network: triangles allow consistency testing
Anchored vs unanchored: common comparator preserves randomization; unanchored breaks it
MAIC = matching-adjusted indirect comparison (population-adjusted, requires IPD from one trial)
STC = simulated treatment comparison (regression-based PAIC)
CINeMA = Confidence in Network Meta-Analysis (NMA-specific GRADE framework)
PRISMA-NMA: reporting checklist extension for NMA
PROSPERO: prospective registry for systematic reviews and NMAs
Bayesian NMA → credible intervals, probability statements (WinBUGS, gemtc)
Frequentist NMA → confidence intervals, p-values (netmeta)
Cipriani 2018 Lancet NMA → 21 antidepressants — canonical published example
WHO PPH NMA 2018 → uterotonics; established heat-stable carbetocin
RECOVERY platform trial → direct evidence reducing need for indirect comparison in COVID-19
Industry-funded NMAs: scrutinize comparator selection and inclusion criteria
Three NMA assumptions: homogeneity, transitivity, consistency
Effect modifier balance is what transitivity requires — not just similar populations broadly
Multi-arm trials: correlation between arms must be modeled
Living NMAs: continuously updated; used in fast-moving fields
Board pearl: When in doubt about SUCRA, look at the league table CrIs — they always trump rank order.
Step 3 management: Match the evidence synthesis method to the question complexity; escalate methodologic uncertainty to a statistician.
Key distinction: Indirect ≠ inferior, but indirect requires transitivity to be valid; without transitivity, the indirect estimate is biased even if precise.
Solid White Background
Board Question Stem Patterns

— "Trial 1: drug A vs placebo, OR for stroke = 0.60. Trial 2: drug B vs placebo, OR = 0.75. What is the indirect estimate of OR for A vs B?"

— Answer logic: OR_AB = OR_A,placebo / OR_B,placebo = 0.60/0.75 = 0.80 (drug A reduces stroke odds by 20% vs B)

— Recognize that variances add on the log scale and CIs widen

— Figure shows nodes for 5 drugs with edges of varying thickness

— Question: "Which comparison has the most direct evidence?" → thickest edge

— Or: "Which comparison relies most on indirect evidence?" → thin or absent edge with longest indirect path

— "Drug A has SUCRA 0.82 for efficacy and 0.30 for safety; drug B has 0.75 for efficacy and 0.78 for safety. Which is preferred?"

— Correct answer: drug B for most patients (better efficacy-safety balance); rote SUCRA-#1 reasoning is wrong

— "An NMA combines 1990s trials of drug A vs placebo with 2020 trials of drug B vs placebo. What is the main threat to validity?"

— Answer: transitivity violation due to changes in background care/standard of care over time

— "Node-splitting reveals direct OR 0.70 (95% CI 0.50–0.95) and indirect OR 1.10 (95% CI 0.80–1.50) for the same comparison. What does this indicate?"

— Answer: inconsistency (the network's direct and indirect estimates disagree); the pooled estimate should not be used

— Asked to advise a P&T committee on NMA-based selection; correct answer integrates SUCRA + effect size + safety + cost + CINeMA confidence, not SUCRA alone

— NMA in adults <65, asked about an 82-year-old with CKD — correct answer flags indirectness and recommends caution + shared decision-making

— Manufacturer presents single-arm trial of new drug vs historical data on standard care; question asks about validity → recognize unanchored indirect comparison with high bias risk

Stem pattern 1 — Bucher calculation:
Stem pattern 2 — Network diagram interpretation:
Stem pattern 3 — SUCRA misinterpretation:
Stem pattern 4 — Transitivity violation:
Stem pattern 5 — Inconsistency detection:
Stem pattern 6 — Formulary decision:
Stem pattern 7 — Special population extrapolation:
Stem pattern 8 — Anchored vs unanchored:
Board pearl: When two answer choices both seem statistically correct, the right one usually incorporates clinical context (effect modifiers, patient population, MCID) — the exam rewards integrated reasoning.
Step 3 management: Practice reading league tables and network diagrams; they appear in increasing frequency on biostatistics/EBM-themed Step 3 items.
Solid White Background
One-Line Recap
Network meta-analysis simultaneously compares multiple treatments using direct and indirect randomized evidence anchored through common comparators — valid only when transitivity holds, consistency is verified, and rankings (SUCRA) are interpreted alongside effect sizes, credible intervals, CINeMA confidence, and individual patient context.
Recap bullet 1 — The math: Indirect comparison uses the Bucher logic — log(OR_AC) = log(OR_AB) − log(OR_BC), with variances adding; NMA generalizes this across an entire network of treatments, preserving randomization within each trial but borrowing strength across trials.
Recap bullet 2 — The three assumptions: Homogeneity (within-comparison similarity), transitivity (between-comparison exchangeability on effect modifiers — the keystone assumption), and consistency (statistical agreement between direct and indirect estimates — testable via node-splitting and design-by-treatment interaction). Violation of any one undermines the entire network.
Recap bullet 3 — Interpretation discipline: SUCRA rankings without effect sizes and credible intervals are misleading; overlapping CrIs mean treatments are statistically tied regardless of rank. CINeMA grades certainty separately for each pairwise comparison — never treat a network's overall quality as uniform.
Recap bullet 4 — Clinical application: Use NMA to inform — not dictate — therapeutic decisions; integrate with patient-specific effect modifiers (age, renal/hepatic function, pregnancy), safety profiles, cost, adherence factors, and patient preferences via shared decision-making, and disclose when comparative claims rest on indirect rather than direct evidence.
Board pearl: On Step 3, NMA questions test whether you can read a network diagram, compute a Bucher estimate, identify transitivity threats, distinguish heterogeneity from inconsistency, and translate rankings into individualized care — not whether you can derive the Bayesian likelihood.
Step 3 management: When citing an NMA, name the comparison, the effect estimate with CrI, the CINeMA confidence, and the applicability to your patient — then document the shared decision.
Solid White Background
bottom of page