Biostatistics & Population Health
Network meta-analysis and indirect comparisons
— Direct evidence: Trials directly comparing A vs B
— Indirect evidence: Inferring A vs C using A vs B and B vs C trials, anchored through the common comparator B
— Mixed evidence: Pooled direct + indirect estimate (the defining feature of NMA)
— Stem mentions "indirect comparison," "common comparator," "ranking probabilities," "SUCRA," or "league table"
— A new drug was approved without head-to-head trials vs the current standard
— A guideline committee must rank multiple therapies for a value-based formulary decision
— Forest plot shows >2 treatments compared to placebo with derived A vs B estimates

— A figure shows a network diagram: nodes = treatments, edges = head-to-head trials, edge thickness = number of trials or patients
— A league table displays all pairwise comparisons (often odds ratios or hazard ratios with 95% credible intervals) in a triangular matrix
— SUCRA values or rankograms order treatments from best to worst on each outcome
— Were trials selected with a prespecified PICO and registered protocol (PROSPERO)?
— Is the comparator network closed (loops present, allowing consistency checks) or star-shaped (all trials vs a single comparator like placebo — no loops, no consistency testing possible)?
— Frequentist vs Bayesian framework: Bayesian NMAs report credible intervals and posterior probabilities of being best; frequentist NMAs report confidence intervals
— Was heterogeneity (τ²) reported across the network, and was inconsistency formally tested (node-splitting, design-by-treatment interaction)?
— Only one trial supports a key edge → fragile indirect estimate
— Older trials (1990s) mixed with modern trials → transitivity violation from changes in background therapy
— Different outcome definitions across trials (e.g., ACR20 vs ACR50)
— Industry-sponsored trials concentrated on one node

— Connectedness: Every treatment must connect through some chain of comparisons; isolated nodes cannot be compared
— Density: Ratio of observed edges to possible edges; sparse networks yield wide credible intervals
— Loops/closed triangles: Required to test consistency; star networks (all vs placebo) cannot test consistency at all
— Co-occurrence patterns: Are certain treatments only compared with industry-favored comparators? (publication/comparator bias)
— Homogeneity (within pairwise comparisons): trials comparing A vs B should be similar enough to pool. Assessed by I² and τ².
— Transitivity (across comparisons): trials comparing A vs B must be clinically similar to trials comparing B vs C with respect to effect modifiers (age, severity, dose, follow-up, background therapy). This is a clinical judgment, not a statistical test.
— Consistency (statistical manifestation of transitivity): direct and indirect estimates of the same comparison should agree. Tested by node-splitting, loop-specific inconsistency, or design-by-treatment interaction model.
— A network is "unstable" if a single trial dominates a key edge, if loops show inconsistency (p<0.05), or if τ² is large across the network
— Sensitivity analyses removing outlier trials should not change rankings substantially

— Network plot: Visualizes node sizes (sample size per treatment) and edge weights (number of trials per comparison)
— Forest plot of all comparisons vs reference: Each treatment plotted against a common reference (often placebo), with pooled effect estimates from the NMA model
— League table: Triangular matrix showing every pairwise comparison; read row vs column. The diagonal contains treatment names; off-diagonal cells contain OR/RR/HR with 95% CI or CrI.
— Rankogram: Probability of each rank (1st, 2nd, 3rd…) for each treatment across MCMC samples
— SUCRA (Surface Under the Cumulative Ranking curve): Single number from 0 to 1 (or 0–100%) summarizing rank performance; 1 = always best, 0 = always worst
— Binary outcomes: odds ratio or risk ratio
— Continuous outcomes: mean difference or standardized mean difference
— Time-to-event: hazard ratio (requires log-HR and SE from each trial)
— Frequentist NMA (e.g., `netmeta` in R): produces point estimates, 95% CIs, p-values for inconsistency
— Bayesian NMA (e.g., WinBUGS, `gemtc`): produces posterior distributions, 95% credible intervals, direct probability statements ("probability drug A is best = 0.72")

| • Node-splitting (Dias method): For each closed loop in the network, "split" a node to compare direct vs indirect evidence for that comparison. A statistically significant difference (p<0.05) flags local inconsistency. | ||
| • Design-by-treatment interaction model (Higgins/White): Tests global inconsistency across the entire network simultaneously by including a design effect for each multi-arm comparison. | ||
| • Loop-specific inconsistency: Calculates inconsistency factor (IF) for each closed loop; | IF | with 95% CI excluding zero suggests inconsistency in that loop. |
| • Component NMA: Used when interventions are combinations (e.g., A+B, A+C, B+C); decomposes effects into additive components — relevant for behavioral, surgical, or multi-drug regimens. | ||
| • Network meta-regression: Adjusts for trial-level covariates (mean age, baseline risk, dose) to address transitivity violations and explain heterogeneity. Caution: ecological fallacy if individual-level inference is drawn. | ||
| • Individual patient data (IPD) NMA: Gold standard — uses raw patient data rather than trial-level summary statistics, enabling subgroup analysis and reducing ecological bias. Rarely feasible due to data-sharing limits. | ||
| • Comparison-adjusted funnel plot: Detects small-study effects/publication bias across a network, with comparisons reordered so newer treatments are on one side. | ||
| • Threshold analysis: Quantifies how much the evidence base would need to change before rankings flip — a measure of robustness. | ||
| • Key distinction: Node-splitting = local (one comparison at a time); design-by-treatment interaction = global (whole network). A consistent network can still have one inconsistent loop, so both should be reported. | ||
| • Board pearl: Adjusted indirect comparison (Bucher method) is the simplest indirect comparison: log(OR_AC) = log(OR_AB) − log(OR_BC), with variances summed. It is a special case of NMA limited to a single triangle and assumes transitivity. Step 3 stems may show this calculation explicitly. | ||
| • Step 3 management: When an NMA reports significant inconsistency, do not use the pooled estimate — revert to direct evidence only, or downgrade certainty substantially in GRADE/CINeMA. |

— Within-study bias: Cochrane RoB 2 applied to contributing trials, weighted by their contribution to each comparison
— Reporting bias: Comparison-adjusted funnel plot, search comprehensiveness
— Indirectness: Do the trial populations/interventions match the review question?
— Imprecision: Credible/confidence interval width relative to a clinically important threshold
— Heterogeneity: τ² compared to empirical predictive distributions
— Incoherence: Statistical test of consistency (node-splitting or global)
— High SUCRA + narrow CrI + high CINeMA confidence → actionable ranking
— High SUCRA + wide CrI → ranking is unstable; do not over-interpret
— Multiple treatments with overlapping CrIs → effectively tied; choose based on safety, cost, patient preference
— Step 1: Is the network connected and is the question's comparison in the network?
— Step 2: What is the point estimate and 95% CrI for the relevant comparison?
— Step 3: Is the effect clinically meaningful (exceeds MCID)?
— Step 4: What is the CINeMA confidence for that specific comparison?
— Step 5: Apply to patient considering effect modifiers

— DOACs for AF: Apixaban, rivaroxaban, dabigatran, edoxaban — only some head-to-head trials exist; NMA informs comparative efficacy/bleeding
— Biologics for RA/IBD/psoriasis: Multiple TNF inhibitors, IL-17, IL-23, JAK inhibitors; NMAs rank by ACR/PASI/clinical remission
— SGLT2 vs GLP-1 for T2DM cardiorenal outcomes: Cross-class NMAs guide ADA/KDIGO recommendations
— Antidepressants: The Cipriani 2018 Lancet NMA of 21 antidepressants is the canonical example
— Antihypertensives: ALLHAT-era and post-hoc NMAs informing thiazide vs ACEi vs CCB first-line
— Identify the efficacy outcome SUCRA AND the safety/tolerability SUCRA — best efficacy drug may have worst discontinuation rate
— Apply patient-specific modifiers: renal function, drug interactions, cost/coverage, pregnancy plans
— Consider acceptability (often modeled as all-cause discontinuation in psychiatry NMAs)
— NMAs of HbA1c reduction may rank differently than NMAs of MACE or mortality
— Always check which outcome the ranking applies to
— NMAs often pool multiple doses per drug; if a single dose is non-standard, transitivity is shaky

— Setup: A vs B trial yields log(OR_AB) ± SE_AB; B vs C trial yields log(OR_BC) ± SE_BC
— Indirect estimate: log(OR_AC) = log(OR_AB) − log(OR_BC)
— Variance: Var(log OR_AC) = Var(log OR_AB) + Var(log OR_BC) (variances add because estimates are independent)
— SE: square root of summed variances
— 95% CI: log(OR_AC) ± 1.96 × SE
— Exponentiate to get OR_AC and its CI
— Trials with ≥3 arms (e.g., A vs B vs C) provide direct evidence on multiple comparisons but the estimates are correlated (share a control group)
— NMA models must account for this correlation — failure to do so underestimates variance and produces falsely narrow CrIs
— Methods: multivariate normal likelihood (Bayesian) or augmented variance-covariance matrix (frequentist)
— Anchored: A common comparator exists (e.g., placebo) — preserves randomization → preferred
— Unanchored: No common comparator (single-arm trials) — must adjust for all prognostic factors and effect modifiers; very high risk of bias (MAIC, STC methods)

— Pharmacokinetic differences (reduced clearance) shift efficacy/toxicity balance
— Competing risks of mortality blunt long-term outcome differences
— Polypharmacy interactions absent from trial populations
— NMA SUCRA rankings for bleeding (DOACs) often diverge in age >75 — apixaban tends to retain favorability
— Most cardiovascular NMAs exclude eGFR <30; SGLT2/GLP-1 cardiorenal NMAs increasingly stratify by baseline eGFR
— Dabigatran: heavily renally cleared; bleeding risk rises disproportionately in CKD — NMA rankings invert in this subgroup
— Trials typically exclude Child-Pugh B/C; NMA conclusions about hepatically metabolized drugs (e.g., rivaroxaban, statins) cannot be extrapolated
— Stratify the network by age, sex, baseline severity, or geography
— Requires sufficient trials within each stratum — often underpowered
— IPD NMA is the only rigorous solution but is rare

— Pregnant patients are systematically excluded from most therapeutic RCTs, so NMA networks for chronic conditions (RA, IBD, MS, depression) have minimal pregnancy data
— Recent pregnancy-specific NMAs exist for gestational diabetes treatment, postpartum hemorrhage prevention (oxytocin vs carbetocin vs misoprostol — WHO 2018 NMA), preeclampsia prophylaxis
— Indirect comparisons in pregnancy carry higher transitivity risk because gestational age, parity, and baseline risk vary widely
— Pediatric ADHD: methylphenidate vs amphetamines vs atomoxetine vs guanfacine — Cochrane NMAs guide first-line choice
— Pediatric asthma: ICS vs ICS+LABA vs LTRA NMAs inform stepwise GINA recommendations
— Pediatric trials are scarce; networks are sparse → wide CrIs and low CINeMA confidence
— Small trials, single-arm studies, and historical controls dominate
— Often require unanchored indirect comparisons (MAIC: matching-adjusted indirect comparison; STC: simulated treatment comparison)
— Regulators accept these only with extensive sensitivity analyses
— MAIC: Reweights individual patient data from one trial to match aggregate characteristics of the comparator trial
— STC: Fits a regression model in one trial and predicts outcomes in the comparator population
— Both reduce bias from unbalanced effect modifiers but require IPD from at least one trial

— Transitivity violation: Combining trials with different background therapies, eras, or populations through a common comparator that is no longer "common" in effect
— Ignored inconsistency: Reporting pooled estimates when node-splitting reveals significant direct/indirect disagreement
— Selective comparator choice: Industry-sponsored NMAs that exclude inconvenient comparators or trials
— Multiple-arm correlation ignored: Treating arms of a 3-arm trial as independent — falsely narrow CrIs
— Overreliance on SUCRA: Reporting rankings without effect sizes or CrIs, creating false confidence
— Formulary decisions favoring SUCRA #1 drugs that are statistically indistinguishable from cheaper alternatives → wasted spend
— Guideline recommendations extrapolated to populations outside the network → harm in excluded subgroups
— Manufacturer marketing claims based on indirect comparisons inflating perceived superiority
— Pairwise meta-analysis publication bias propagates through indirect estimates and can distort multiple rankings simultaneously
— Comparison-adjusted funnel plots detect but rarely correct this
— Removing a single trial can flip rankings — robustness must be tested via leave-one-out sensitivity analysis
— Clinicians treat 95% CrI as if it were a confidence interval; Bayesian intervals depend on prior choice (informative vs vague priors can shift conclusions)

— You are writing a guideline or formulary policy and the NMA's CINeMA confidence is low/very-low for the relevant comparison
— Your patient population (pregnancy, dialysis, pediatric, geriatric with multimorbidity) is unrepresented in the network
— The NMA shows significant inconsistency but reports pooled estimates anyway
— Conflicting NMAs on the same question reach different rankings — common in psoriasis biologics, antidepressants
— A manufacturer submits an unanchored indirect comparison for regulatory or coverage purposes
— Clinical epidemiologist or biostatistician with NMA experience
— HTA agency methodologists (NICE, ICER) for coverage decisions
— Cochrane editorial team for systematic review questions
— GRADE/CINeMA working group resources for certainty grading
— Re-run sensitivity analyses with your population in mind
— Conduct meta-regression on key effect modifiers
— Update the network with newer trials
— Adjudicate conflicting NMAs by comparing inclusion criteria, network geometry, and statistical models
— Just as you escalate a deteriorating patient to ICU, escalate methodologic uncertainty to expert consultation rather than guessing
— Document the consultation and its impact on your decision
— Large health systems' P&T committees should have biostatistical input for NMA-based decisions
— Value-based contracts increasingly rely on NMA-derived comparative effectiveness — getting it wrong has financial and clinical consequences

— Pools only direct evidence on a single comparison (A vs B)
— Simpler, less assumption-heavy, but cannot rank multiple treatments
— Use when you have a focused two-arm question with sufficient direct trials
— Simplest case of indirect comparison — single triangle (A vs B, B vs C → A vs C)
— Preserves randomization within trials; assumes transitivity
— Use when only two pairwise meta-analyses exist and you need to compare A vs C
— Older term for NMA, particularly in Bayesian framework
— Functionally equivalent to NMA
— For interventions that share components (combination therapies, complex behavioral interventions)
— Decomposes effects into additive component contributions
— Use in surgical bundles, multi-drug regimens, psychotherapy components
— Uses raw patient data from each trial
— Enables subgroup analysis, addresses ecological bias, handles missing data better
— Use when raw data are accessible (often via consortia)
— Continuously updated as new trials publish
— Used in fast-moving fields (COVID-19 therapeutics — WHO living NMA)
— Use when evidence base is rapidly evolving
— Adjusts for cross-trial differences in effect modifiers using IPD from one trial
— Use when anchored NMA is impossible or transitivity is violated

— Highest internal validity for the specific comparison
— Limited to two (or few) arms; cannot generalize to untested treatments
— Use when a definitive A vs B trial exists and matches your patient
— Test multiple treatments concurrently within a single trial infrastructure
— Produce direct head-to-head comparisons that strengthen NMA networks
— Often the gold standard for resolving NMA inconsistencies
— Registry, claims, EHR data with target trial emulation, propensity scores
— Captures pregnancy, geriatric, multimorbid patients excluded from RCTs
— Subject to confounding by indication despite statistical adjustment
— Combines NMA efficacy/safety outputs with cost and utility data
— Produces ICERs (incremental cost-effectiveness ratios) for formulary/coverage decisions
— NMAs feed into the efficacy parameters of CEA models
— Synthesize multiple systematic reviews/meta-analyses on a topic
— Different from NMA — do not pool indirect evidence statistically
— Used when evidence is too sparse or conflicting for quantitative synthesis
— Lower in evidence hierarchy but important when NMAs cannot be performed

— For chronic disease (HF, DM, RA), NMA-informed first-line choice is just the start — durability, adherence, and long-term safety drive sustained benefit
— Switch decisions (e.g., second-line biologic after TNF failure) often rely on post-failure NMAs, which are sparser and less certain
— Post-MI: aspirin + P2Y12 inhibitor — NMA of ticagrelor vs prasugrel vs clopidogrel informs DAPT choice (ISAR-REACT 5 added direct evidence)
— Post-stroke: DOAC choice for AF-related cardioembolic stroke — NMAs guide apixaban preference in high-bleeding-risk patients
— Post-DVT/PE: anticoagulant choice — NMA supports DOACs over warfarin for most patients
— Cumulative toxicity (e.g., long-term steroid exposure, JAK inhibitor cardiovascular signal)
— Tachyphylaxis and durability (biologic immunogenicity)
— Adherence patterns (once-daily vs multiple daily dosing)
— Drug interactions accumulating over time
— Increasingly used for rapidly evolving fields (oncology, biologics, antivirals)
— Allow guideline updates without waiting for definitive head-to-head trials
— Communicate that the drug choice is based on average evidence, not individual prediction
— Frame in shared decision-making: efficacy, safety, cost, lifestyle fit

— Track the outcome the NMA prioritized (e.g., HbA1c, ACR50, PASI75, MACE) at evidence-based intervals
— Capture safety endpoints that may differ between agents (eGFR for SGLT2, lipid panel for JAK, LFTs for biologics)
— Reassess at intervals matched to trial follow-up — drugs ranked well at 12 weeks may not retain benefit at 52 weeks
— New definitive head-to-head RCT publishes (may overturn indirect-comparison-based rankings)
— Updated NMA changes the league table for your patient's profile
— Patient experiences ranked adverse event (switch within class)
— New comorbidity emerges that violates the transitivity of the original NMA's applicability
— Explain that "we chose this medication because, across many studies, it had the best balance of benefit and side effects for people similar to you"
— Acknowledge uncertainty: "Some of the comparisons were indirect — meaning we estimate them by combining studies — so there's some uncertainty"
— Discuss alternatives: SUCRA #2 and #3 are often clinically equivalent
— Guidelines citing NMAs change as networks expand; reassure patients that updates reflect better evidence, not previous error
— EHR clinical decision support increasingly references NMA-derived rankings
— Be aware of which NMA the CDS is built on and whether it matches current evidence

— When recommending a therapy whose advantage over alternatives rests on indirect comparison, ethical practice supports disclosing this uncertainty during shared decision-making
— Example: Recommending apixaban over rivaroxaban based on NMA (no head-to-head RCT) — patient should understand the comparative evidence is indirect
— Failure to disclose material uncertainty can compromise consent quality
— Industry-funded NMAs may selectively choose comparators or trials favoring sponsor products
— Guideline panel members with relevant COI should recuse from NMA-based recommendations (per IOM/NAM standards)
— Disclose COI when communicating NMA-based recommendations to patients
— Excluded populations (pregnancy, pediatrics, racial/ethnic minorities, elderly with multimorbidity) inherit the bias of trial enrollment when NMA conclusions are extrapolated
— Health systems applying NMA-based formulary decisions without subgroup considerations may exacerbate disparities
— When patients move between systems with different NMA-based formularies, therapeutic substitutions within class occur — most are safe (CrIs overlap) but some are not (e.g., dabigatran vs apixaban in CKD)
— Medication reconciliation must consider why the original drug was chosen
— Systematic reviews and NMAs should be prospectively registered (PROSPERO) and report per PRISMA-NMA
— Publication bias undermines NMA validity; FDAAA and EU clinical trial registry mandates support transparency
— Acting on an NMA whose inconsistency was not tested, or whose CrIs were ignored, is analogous to acting on an uncalibrated diagnostic test — a latent safety hazard in evidence-based practice


— "Trial 1: drug A vs placebo, OR for stroke = 0.60. Trial 2: drug B vs placebo, OR = 0.75. What is the indirect estimate of OR for A vs B?"
— Answer logic: OR_AB = OR_A,placebo / OR_B,placebo = 0.60/0.75 = 0.80 (drug A reduces stroke odds by 20% vs B)
— Recognize that variances add on the log scale and CIs widen
— Figure shows nodes for 5 drugs with edges of varying thickness
— Question: "Which comparison has the most direct evidence?" → thickest edge
— Or: "Which comparison relies most on indirect evidence?" → thin or absent edge with longest indirect path
— "Drug A has SUCRA 0.82 for efficacy and 0.30 for safety; drug B has 0.75 for efficacy and 0.78 for safety. Which is preferred?"
— Correct answer: drug B for most patients (better efficacy-safety balance); rote SUCRA-#1 reasoning is wrong
— "An NMA combines 1990s trials of drug A vs placebo with 2020 trials of drug B vs placebo. What is the main threat to validity?"
— Answer: transitivity violation due to changes in background care/standard of care over time
— "Node-splitting reveals direct OR 0.70 (95% CI 0.50–0.95) and indirect OR 1.10 (95% CI 0.80–1.50) for the same comparison. What does this indicate?"
— Answer: inconsistency (the network's direct and indirect estimates disagree); the pooled estimate should not be used
— Asked to advise a P&T committee on NMA-based selection; correct answer integrates SUCRA + effect size + safety + cost + CINeMA confidence, not SUCRA alone
— NMA in adults <65, asked about an 82-year-old with CKD — correct answer flags indirectness and recommends caution + shared decision-making
— Manufacturer presents single-arm trial of new drug vs historical data on standard care; question asks about validity → recognize unanchored indirect comparison with high bias risk


