Biostatistics & Population Health
Fixed-effect vs random-effects meta-analysis models
— Appropriate when studies are functionally identical: same population, same intervention dose, same outcome definition, same follow-up.
— Classic example: pooling identically designed multicenter trial sites or pre-specified subgroups of one protocol.
— Appropriate when studies differ in population, dose, setting, era, or outcome ascertainment — i.e., almost all real-world clinical meta-analyses.
— Wider confidence intervals because τ² (between-study variance) is added to within-study variance.
— Forest plot shows non-overlapping CIs across trials.
— I² > 50% or Cochran Q p < 0.10.
— Trials span different decades, geographies, or comparator regimens.
— A small trial with a huge effect dominates a fixed-effect pool.

— Forest plot with diamond at the bottom representing the pooled estimate.
— Summary statement in a guideline: "Pooled RR 0.78 (95% CI 0.65–0.94)."
— Conflicting trials scenario: two large RCTs disagree; a meta-analysis is proposed.
— Studies span multiple countries or decades (background risk differs).
— Different drug doses, formulations, or comparators.
— Variable outcome definitions (e.g., "MI" defined by different troponin assays).
— Mixed inpatient and outpatient populations.
— Use of individual patient data (IPD) from disparate cohorts.
— Pre-planned pooled analysis of identical-protocol trials (e.g., CHARM-Added, CHARM-Alternative, CHARM-Preserved pooled).
— Genetic association meta-analyses of identical SNPs in similar ancestry groups.
— Diagnostic test meta-analysis of a single assay against one reference standard.
— If the question states the pooled estimate is much closer to the largest trial under fixed-effect but shifts toward the average of all trials under random-effects, this reflects the weighting difference — fixed-effect weights by inverse variance (large trials dominate); random-effects gives smaller studies relatively more weight.

— Overlapping CIs across studies → consistent effects → fixed-effect may be defensible.
— Non-overlapping CIs or effects on opposite sides of the null → substantial heterogeneity → random-effects.
— A single huge square dominating others suggests one mega-trial drives a fixed-effect pool.
— Fixed-effect: weight = 1/(within-study variance). Large trials get disproportionate influence.
— Random-effects: weight = 1/(within-study variance + τ²). Adding τ² shrinks the weight of large trials and relatively elevates small trials.
— Practical consequence: random-effects pooled estimates are often pulled toward smaller, possibly biased trials — a known limitation.
— Cochran Q: chi-square test; p < 0.10 suggests heterogeneity (low power with few studies).
— I²: % of total variation due to heterogeneity rather than chance. Rough bands: 0–40% low, 30–60% moderate, 50–90% substantial, 75–100% considerable.
— τ² (tau-squared): absolute between-study variance; reported in random-effects models.

— Cochran's Q statistic: sum of weighted squared deviations of each study from the pooled estimate. Distributed as χ² with k−1 df (k = number of studies).
– p < 0.10 (note: not 0.05) conventionally flags heterogeneity due to low power when studies are few.
— I² statistic: I² = max(0, (Q − df)/Q) × 100%. Interpreted as proportion of total variance attributable to between-study heterogeneity.
— τ² (tau-squared): estimated variance of the underlying distribution of true effects; reported in the same units as the effect size (log OR, log RR, mean difference).
— DerSimonian-Laird (DL): classic, fast, can underestimate τ² with few studies.
— REML (restricted maximum likelihood): now preferred default for continuous outcomes.
— Paule-Mandel and Hartung-Knapp-Sidik-Jonkman (HKSJ): HKSJ adjustment widens CIs appropriately when study count is small (< 10).

— Re-run analysis excluding the largest trial, the smallest trial, or high-risk-of-bias trials.
— Re-run with alternative model (fixed vs random) and compare.
— If the conclusion flips, the meta-analysis is fragile.
— Stratify by population (age, sex, comorbidity), intervention (dose, duration), or methodology (blinding, allocation concealment).
— Test for subgroup interaction (the "test for subgroup differences") — significant interaction means effect varies meaningfully across strata.
— RoB 2 for RCTs (Cochrane).
— ROBINS-I for non-randomized studies.
— GRADE framework rates certainty of evidence (high/moderate/low/very low) factoring in risk of bias, inconsistency, indirectness, imprecision, publication bias.
— Funnel plot: scatter of effect vs precision; asymmetry suggests missing small negative studies.
— Egger's test: regression-based test for funnel asymmetry (k ≥ 10 studies).
— Trim-and-fill: estimates impact of hypothetically missing studies.

— Step 1: Are studies clinically and methodologically near-identical? If yes → fixed-effect defensible. If no → random-effects.
— Step 2: Is I² < 25% and Q-test p > 0.10 and small number of studies (< 5)? → fixed-effect acceptable but report both.
— Step 3: Is I² > 50% or τ² substantial? → random-effects mandatory; consider whether pooling is appropriate at all.
— Step 4: Is I² > 75% with clinical heterogeneity? → consider not pooling; instead present narrative synthesis or stratified estimates.
— Using fixed-effect when heterogeneity exists → falsely narrow CI, overstated precision, inflated type I error.
— Using random-effects when truly homogeneous → slightly wider CI (conservative), minimal harm.
— Asymmetry of risk favors random-effects as the safer default.

— Weight for study i: wᵢ = 1/vᵢ, where vᵢ = within-study variance.
— Pooled effect: θ̂ = Σ(wᵢ × θᵢ) / Σwᵢ.
— Variance of pooled effect: 1/Σwᵢ.
— 95% CI: θ̂ ± 1.96 × √(1/Σwᵢ).
— Weight: wᵢ* = 1/(vᵢ + τ²).
— Pooled effect: θ̂_RE = Σ(wᵢ × θᵢ) / Σwᵢ.
— Variance: 1/Σwᵢ*.
— When τ² = 0, the random-effects estimate collapses to the fixed-effect estimate.
— Binary outcomes: odds ratio (OR), risk ratio (RR), risk difference (RD) — usually analyzed on log scale.
— Continuous outcomes: mean difference (MD) when units identical; standardized mean difference (SMD, Hedges' g) when scales differ.
— Time-to-event: hazard ratio (HR) from log HR and SE.

— Produces relative rankings (e.g., SUCRA — surface under the cumulative ranking curve).
— Requires the transitivity assumption: trials are similar enough that indirect comparisons are valid.
— Consistency check: direct and indirect estimates for the same comparison should agree (node-splitting, design-by-treatment interaction).
— Gold standard; allows uniform outcome definitions, subgroup analyses, and time-to-event modeling.
— Two-stage approach: analyze each trial separately, then pool effect estimates (akin to standard meta-analysis).
— One-stage approach: pool all patients into a single hierarchical model with study as a random effect — more statistically efficient.

— τ² is poorly estimated with few studies; standard DerSimonian-Laird underestimates uncertainty.
— Hartung-Knapp-Sidik-Jonkman (HKSJ) adjustment is recommended — uses a t-distribution and inflates CI appropriately. Now the Cochrane-preferred approach for random-effects with few studies.
— With k = 2–3 studies, meta-analysis is statistically fragile; consider narrative synthesis or Bayesian methods with informative priors.
— Inverse-variance methods fail when many studies have zero events.
— Mantel-Haenszel without continuity correction preferred for sparse binary data.
— Peto OR valid when events are rare (< 1%) and arms balanced; biased otherwise.
— Exact methods (e.g., beta-binomial) increasingly used for rare adverse events.
— Continuity corrections (adding 0.5 to zero cells) distort estimates and should be avoided when better methods exist.
— Use logit or arcsine transformation to stabilize variance.
— Random-effects almost always appropriate due to between-cohort heterogeneity.
— Use standardized mean difference (Hedges' g) with small-sample correction.
— Interpretation thresholds (Cohen): 0.2 small, 0.5 medium, 0.8 large.

— Bivariate random-effects model: jointly models logit-sensitivity and logit-specificity with their correlation.
— HSROC (hierarchical summary ROC) model: estimates a summary ROC curve accounting for between-study threshold variation.
— Simple separate pooling of sensitivity and specificity is incorrect and outdated.
— Pool adjusted hazard ratios when available; never pool unadjusted with adjusted estimates without stratification.
— Watch for inconsistent adjustment sets across studies — a major source of heterogeneity that random-effects cannot fix.
— Use logit-transformed C-statistic for pooling.
— Heterogeneity often substantial because case-mix differs.
— Often defensible with fixed-effect when ancestry and assay are similar.
— Use random-effects when combining across ancestries or platforms.
— Frequently rely on observational data because RCTs are scarce.
— Random-effects mandatory; transparency about study designs is essential.

— Smaller studies often show larger effects due to publication bias, methodologic issues, or selective reporting.
— Random-effects gives small studies more weight, so it can be more affected by small-study bias than fixed-effect.
— Funnel plot asymmetry and Egger's test should be reported.
— I² = 0% does not prove homogeneity, especially with few or imprecise studies.
— High I² with effects all in the same direction (e.g., all favor treatment) still allows meaningful pooling — heterogeneity is in magnitude, not direction.
— Studies selectively reporting favorable outcomes inflate pooled effects. Trial registries (ClinicalTrials.gov, PROSPERO) allow detection.
— Pooling studies with different P, I, C, or O than the question of interest dilutes applicability.
— Study-level associations (e.g., mean age vs effect) do not imply individual-level associations. IPD meta-analysis avoids this.
— Updated meta-analyses that selectively add favorable trials can drift toward false-positive conclusions.

— Clinical heterogeneity is overwhelming: different diseases, very different interventions, incompatible outcomes.
— I² > 75% with no identifiable explanatory subgroup.
— Fewer than 2 studies with extractable comparable data.
— Conflicting study designs without justification for combining.
— Narrative (qualitative) synthesis following SWiM (Synthesis Without Meta-analysis) guidelines.
— Structured tabular comparison with risk-of-bias annotations.
— Vote counting based on direction of effect (last resort, weakest method).
— Network meta-analysis.
— Bayesian or hierarchical modeling.
— IPD meta-analysis.
— Rare-events analysis.
— Diagnostic test accuracy with bivariate/HSROC modeling.
— Inconsistency (unexplained heterogeneity) → downgrade.
— Imprecision (CI crosses clinical decision threshold) → downgrade.
— Publication bias (funnel asymmetry) → downgrade.
— Indirectness (PICO mismatch) → downgrade.
— Risk of bias in included studies → downgrade.

— Fixed-effect (singular) meta-analysis = common-effect model assuming one true effect.
— Fixed-effects (plural) regression = econometrics term for a model that estimates separate intercepts for each unit (e.g., panel data); different concept entirely.
— Boards may exploit this confusion — read carefully.
— Pooled analysis: combines raw individual-level data (essentially one-stage IPD).
— Meta-analysis: combines summary statistics across studies.
— Aggregate uses published effect estimates; IPD uses patient-level data — IPD is gold standard but resource-intensive.
— Systematic review: structured literature synthesis (may or may not pool).
— Meta-analysis: quantitative pooling step within (or independent of) a systematic review.
— Frequentist: CI, p-values, DerSimonian-Laird or REML.
— Bayesian: credible intervals, posterior probabilities; handles few studies and rare events better with informative priors.
— Subgroup: categorical stratification.
— Meta-regression: continuous covariate modeling at the study level.
— Cumulative shows pooled estimate evolving over time.
— Trial sequential analysis (TSA) adjusts for repeated testing across accumulating trials, controlling type I error inflation — analogous to interim analyses in single RCTs.

— Observational data analyzed with causal inference methods (propensity scores, instrumental variables, g-methods) to emulate an RCT.
— Increasingly used in regulatory decisions; should not be pooled naively with RCTs in meta-analyses.

— PRISMA 2020: 27-item checklist for systematic reviews and meta-analyses.
— PRISMA-NMA: extension for network meta-analyses.
— PRISMA-IPD: extension for individual patient data.
— MOOSE: for meta-analyses of observational studies.
— PROSPERO: international prospective registry for protocols — pre-registration reduces selective reporting.
— Magnitude of effect (relative and absolute).
— Certainty of evidence (high/moderate/low/very low).
— Values and preferences.
— Resource use and cost-effectiveness.
— Equity, acceptability, feasibility.
— Translate pooled relative effects (RR, OR, HR) into absolute risk differences and NNT/NNH using baseline risk for the patient population.
— Acknowledge prediction interval when generalizing to a new setting.
— Avoid overclaiming based on borderline pooled significance with high heterogeneity.
— High-impact reviews: update every 2–3 years or after major new trial.
— Living reviews: continuous updating with version-controlled releases.
— Guideline panels (ACC/AHA, USPSTF, IDSA) cite meta-analyses as evidence base; the strength of recommendation depends on both effect magnitude and certainty.

— Define search update frequency (monthly, quarterly).
— Define trigger criteria for re-pooling (new trial with ≥ X events, new intervention arm).
— Maintain version-controlled publication with change logs.
— Sets monitoring boundaries analogous to alpha-spending functions in RCTs.
— Flags when the required information size has been reached, indicating further trials may be unnecessary.
— Prevents premature conclusions from cumulative meta-analysis when repeated testing inflates type I error.
— A previously homogeneous body of evidence can become heterogeneous as new populations, doses, or comparators are studied.
— Re-estimate I² and τ² with each update; switch from fixed-effect to random-effects if heterogeneity emerges.
— Watch for late-emerging subgroup effects (e.g., age, sex, ancestry) that may change clinical recommendations.
— Industry-sponsored trials systematically report larger effects; sensitivity analyses by funding source should be updated.
— Distinguish high-quality (PRISMA-compliant, pre-registered, IPD or rigorous aggregate) from low-quality reviews.
— Recognize predatory or redundant meta-analyses that flood the literature.

— A fixed-effect model misapplied to heterogeneous data produces falsely narrow confidence intervals, leading guideline panels to issue stronger recommendations than the evidence warrants — patients may receive marginally beneficial or harmful interventions with overconfidence.
— Historical example: cumulative meta-analyses showed streptokinase was conclusively beneficial in MI by 1973, yet trials continued for over 15 years. Hundreds of patients in placebo arms died from withholding effective therapy. The ethical duty to synthesize evidence in real time is now embedded in trial design (DSMBs reviewing cumulative evidence).
— Industry-sponsored trials systematically report more favorable results; meta-analyses inheriting this bias mislead clinicians.
— Selective outcome reporting in primary trials propagates into meta-analyses; mandatory trial registration (FDAAA, ClinicalTrials.gov) is an ethical safeguard.
— When discussing therapies supported only by meta-analyses with high heterogeneity, disclose uncertainty — patients have the right to know that pooled estimates may not apply to their specific clinical context.
— Use absolute risk reductions and NNT rather than relative effects to avoid framing bias.
— Conducting redundant meta-analyses when an adequately updated one exists wastes resources and clutters literature — an ethical concern flagged by COMET and Cochrane.
— Predatory journals publish low-quality, unregistered meta-analyses — be skeptical of unindexed sources.
— Outdated meta-analyses embedded in decision-support tools can perpetuate obsolete recommendations across institutions. EHR-integrated guidelines should reference living evidence sources and last-update dates.
— FDA increasingly accepts meta-analyses as supportive evidence for label changes (e.g., cardiovascular safety of antidiabetics). Methodologic rigor has medicolegal weight.


— Stem describes 15 RCTs of an antihypertensive from 1990–2020 with diverse populations and I² = 67%.
— Answer: Random-effects model.
— Forest plot shows fixed-effect OR 0.85 (CI 0.80–0.90), random-effects OR 0.78 (CI 0.65–0.93).
— Answer: Between-study heterogeneity; random-effects gives more weight to smaller studies and incorporates τ² into the variance.
— Answer: Approximately 72% of the total variability in effect estimates is due to between-study heterogeneity rather than chance.
— Stem: high heterogeneity in a meta-analysis.
— Answer: Subgroup analysis or meta-regression to explore sources of heterogeneity; do not simply report the pooled estimate.
— Answer: Publication bias / small-study effects; consider trim-and-fill and downgrade GRADE certainty.
— Answer: Fixed-effect estimate is essentially the mega-trial; consider whether pooling adds information.
— Answer: Favor the high-quality pragmatic RCT when methods are sound; update GRADE certainty.
— Answer: Focus on pairwise effect estimates and CIs; SUCRA rankings are descriptive, not definitive.
— Critique: Should use bivariate or HSROC model to account for threshold effects.
— Critique: Use HKSJ adjustment; τ² is underestimated with few studies.

Use the fixed-effect model only when included studies plausibly share one true effect; choose the random-effects model whenever clinical or statistical heterogeneity exists, and report I², τ², a prediction interval, and explore sources of variation before applying the pooled estimate to your patient.

