Biostatistics & Population Health
Non-inferiority and equivalence trial design
— New therapy is expected to have similar efficacy but offers a secondary advantage: lower cost, fewer side effects, easier dosing, better adherence, no monitoring (e.g., DOACs vs warfarin), or improved safety profile.
— Placebo control is unethical because an effective standard of care already exists (e.g., new antibiotic for bacterial pneumonia, new antihypertensive in established HTN).
— Generic, biosimilar, or device-substitution studies.

— Numbers are deliberately specific because the answer hinges on whether the upper bound of the CI falls below the margin.
— Anticoagulation: DOACs vs warfarin (RE-LY, ROCKET-AF, ARISTOTLE) — classic NI designs.
— Antibiotics: new agent vs standard (e.g., ceftaroline vs vancomycin); FDA generally requires NI for serious bacterial infection trials.
— Oncology: shorter chemo course vs standard; less-toxic regimen vs full-dose.
— Cardiology: TAVR vs SAVR (started NI, later showed superiority in some risk strata).
— Endocrine: new insulin analogs vs glargine; semaglutide cardiovascular safety (initially NI for CV outcomes per FDA 2008 guidance).
— Stem mentions a pre-specified margin Δ.
— Stem emphasizes the new drug's non-efficacy advantages (oral vs IV, no INR, once-daily, cheaper, fewer drug interactions).
— Outcome is reported with a one-sided confidence interval or 97.5% CI (one-sided α = 0.025).

— Step 1 (M1): Estimate the effect of the active comparator versus placebo from historical placebo-controlled trials. Take the lower bound of the 95% CI of that effect — this is the entire effect the standard drug is known to provide.
— Step 2 (M2): Choose Δ as a clinically acceptable fraction of M1 — typically 50%. The new drug must preserve at least half the established benefit.
— The current trial must be similar enough to the historical placebo trials (same population, endpoint definitions, adherence rates) that the comparator's effect would still hold today.
— Threats: improved background care, better diagnostics, or healthier modern enrollees can shrink the comparator's true effect → margin too generous → "biocreep."

— For ratio outcomes (HR, RR, OR): null = 1.0. If new drug is "bad" (more events), HR > 1.
— For difference outcomes (risk difference, mean difference): null = 0.
— NI margin Δ is placed on the "worse" side of the null (e.g., HR 1.38, risk difference +10%).
— (A) Superior: Entire CI lies left of null (HR < 1). New drug beats comparator. NI automatically met.
— (B) Non-inferior and not superior: CI crosses null but upper bound is below Δ. NI demonstrated; superiority not demonstrated.
— (C) Inconclusive: CI crosses Δ. Cannot conclude non-inferiority; cannot conclude inferiority. Typically underpowered.
— (D) Inferior: Entire CI lies right of null but below Δ — rare interpretation; technically non-inferior despite point estimate suggesting harm. Most regulators still accept NI here, but clinicians often won't.
— (E) Frankly inferior: Entire CI lies right of Δ. New drug is worse than acceptable.

— Preserves randomization and balance of confounders.
— In superiority trials, ITT is conservative — non-adherence and dropouts bias results toward the null, making it harder to show a difference.
— In NI trials, PP is considered the more conservative analysis because non-adherence won't artificially shrink between-group differences.
— Sample size: NI trials usually need larger N than superiority trials because they must rule out small differences. Sample size ∝ 1/Δ².
— Endpoint choice: Must match the historical placebo-controlled trials that justified the margin (constancy).
— Missing data handling: Pre-specified; multiple imputation or tipping-point analyses are common sensitivity analyses.
— Switching to superiority: Pre-specified hierarchical testing allows a positive NI trial to then test superiority without alpha penalty — but not the reverse.

— Is there an effective standard of care for this condition?
— No → placebo-controlled superiority trial is appropriate and ethical.
— Yes → placebo is usually unethical → use active-comparator design (NI, equivalence, or active-control superiority).
— Among active-comparator designs:
— Does the new intervention offer a non-efficacy advantage (safety, cost, convenience)? → Non-inferiority.
— Are you comparing bioequivalent formulations (generic, biosimilar, same drug different route)? → Equivalence (bioequivalence: 90% CI of AUC and Cmax ratio within 0.80–1.25).
— Do you have strong reason to believe the new drug is better? → Superiority vs active control.
— No reliable historical estimate of comparator's effect vs placebo (constancy/assay sensitivity fails).
— Outcome is rare or highly variable → margin cannot be defined precisely.
— Comparator effect is small → preserving 50% leaves a clinically trivial benefit.
— Margin tightness (smaller Δ = stronger claim).
— Concordance of ITT and PP analyses.
— Adherence rates (high adherence → less ITT/PP divergence).
— Modernity of the historical comparator-vs-placebo data (older trials = more biocreep risk).

— Expected event rate (or mean) in the comparator group (p₀).
— Expected event rate in the new-drug group (p₁) — usually assumed equal to p₀ if you truly expect no difference.
— Non-inferiority margin Δ — smaller Δ requires larger N (inversely with Δ²).
— α (one-sided, usually 0.025) and β (usually 0.10 or 0.20).
— Wide CI → upper bound crosses Δ → inconclusive trial, not a negative trial.
— Common reason NI trials "fail": insufficient enrollment, higher-than-expected dropout, lower-than-expected event rate.
— Margin Δ and its justification.
— Primary analysis population (ITT vs PP — usually both, co-primary).
— Handling of missing data.
— Hierarchical testing plan (NI first, then superiority).
— Subgroup analyses (exploratory only).

— Pre-specify: test NI first at one-sided α = 0.025.
— If NI is met, proceed to test superiority at two-sided α = 0.05 using the same data — no alpha adjustment needed because the hierarchy controls family-wise error.
— If superiority is met → label drug "superior."
— If only NI is met → label drug "non-inferior."
— 1. Identify the design: look for "non-inferiority margin" or "equivalence margin."
— 2. Note the comparator and confirm it represents true standard of care.
— 3. Locate the point estimate and its CI (one-sided 97.5% or two-sided 95%).
— 4. Compare the upper bound of the CI to Δ:
— Upper bound < Δ → NI met.
— Upper bound ≥ Δ → NI not met.
— 5. Compare the CI to null (1.0 or 0):
— Entire CI on favorable side of null → superiority met.
— CI crosses null → no superiority.
— 6. Confirm ITT and PP agree.
— 7. Check for assay sensitivity threats (modern care, biocreep).
— RE-LY (dabigatran vs warfarin in AF) — NI met, then superiority met for 150 mg dose.
— ROCKET-AF (rivaroxaban) — NI met in PP; ITT borderline.
— PALLAS (dronedarone in permanent AF) — stopped early for harm, illustrating that NI trials can reveal inferiority.

— When the comparator provides a large mortality benefit (e.g., antibiotics in sepsis, anticoagulation in massive PE), even a small loss of efficacy is unacceptable. Δ must be tight, often preserving >67% of M1 rather than the standard 50%.
— FDA antibiotic NI margins for serious infections have been progressively tightened (often Δ = 10% absolute risk difference, reduced from historical 15–20%) after concerns that older trials approved inferior agents.
— Pivotal NI trials often exclude patients with eGFR < 30 or Child-Pugh B/C, limiting external validity.
— Step 3 might ask: "A new DOAC was shown non-inferior to warfarin in patients with CrCl > 50. Is it appropriate to prescribe in CrCl 25?" → No, because the population studied does not match the patient; NI claim does not generalize.
— Margins are typically expressed as HR upper bounds (e.g., 1.15–1.25) for OS or PFS.
— De-escalation trials (shorter chemo duration, omitting a drug) increasingly use NI design with tight margins because losing survival to save toxicity must be carefully bounded.
— Originally required NI margin of HR < 1.3 vs placebo for MACE — essentially a safety NI requirement.
— Several agents (empagliflozin, liraglutide, semaglutide) exceeded NI and demonstrated superiority for CV outcomes.

— NI trials in pregnant patients are rare; pregnancy is typically excluded from pivotal RCTs.
— Extrapolation of NI claims to pregnancy requires separate PK and safety bridging studies.
— Example: NI trials of LMWH vs UFH in VTE prophylaxis often exclude pregnancy; obstetric VTE management relies on observational data and consensus guidelines.
— FDA permits extrapolation of efficacy from adult NI/superiority trials to pediatric populations when disease pathophysiology and drug response are similar, plus pediatric PK/safety data.
— Pediatric NI trials are often underpowered because of low enrollment; margins are necessarily wider, raising biocreep concerns.
— Biosimilar approval (FDA Purple Book) requires demonstrating "no clinically meaningful differences" in efficacy, safety, immunogenicity vs the reference biologic.
— Statistical test: typically equivalence with a symmetric margin (often ±12–15% for response rates), using two-sided 90% CI (equivalent to two one-sided tests at α = 0.05).
— Indication extrapolation: a biosimilar approved for RA may be approved for IBD without separate trials if mechanism justifies it.
— Pharmacokinetic equivalence: 90% CI of geometric mean ratio (test/reference) for AUC and Cmax must fall within 0.80–1.25.
— Narrow-therapeutic-index drugs (warfarin, levothyroxine, phenytoin, lithium, digoxin) have tighter bioequivalence margins (0.90–1.11).
— Underrepresentation of women and minorities in pivotal trials limits NI claim generalizability.
— FDA Drug Trials Snapshots disclose demographic distributions for new approvals.

— Causes:
— Margin Δ set too wide (preserving < 50% of M1).
— Sloppy ITT-only analysis with high crossover diluting differences.
— Comparator underdosed.
— Constancy/assay sensitivity violations (modern background care improved → comparator's effect smaller than historical).
— Solution: Always re-anchor Δ to original placebo-controlled M1, not to the most recent comparator.
— Detection: examine event rates in current vs historical comparator arms. If event rate is much lower today, comparator's effect is likely smaller, and the original margin is too generous.
— High dropout/crossover (>10–20%) undermines PP analyses.
— Site heterogeneity, especially in global multicenter trials, can dilute effects.
— Open-label NI trials (no blinding) introduce ascertainment bias.
— Trials with discordant ITT/PP results may receive a complete response letter rather than approval.
— Sponsors may attempt post-hoc margin revision — universally rejected.

— Stopping for harm: If new drug causes excess events in the prespecified safety endpoint, trial halts (e.g., PALLAS, dronedarone in permanent AF).
— Stopping for futility: If interim analysis suggests the trial cannot demonstrate NI even with full enrollment, DSMB may recommend halting to spare patients.
— Stopping for early NI success: Rare in NI trials because the design requires precision; early stopping inflates CI width and risks false NI declaration.
— Margin justification is debated.
— ITT and PP results diverge.
— Constancy is questioned.
— Safety signals emerge alongside efficacy NI.
— NDA/BLA for new drugs/biologics; NI design is acceptable if margin is pre-specified and justified.
— ANDA for generics: bioequivalence only, no efficacy trial.
— 351(k) for biosimilars: equivalence statistics + analytical similarity.
— Real-world evidence (claims, registries, pragmatic trials) suggesting worse outcomes.
— FDA MedWatch reports of unexpected adverse events.
— Post-marketing required studies (Phase IV) showing efficacy gap.

— Null hypothesis: no difference between arms.
— Reject null when CI does not cross null value (1.0 for ratios, 0 for differences).
— Asks: "Is the new drug better?"
— Two-sided α = 0.05 conventional.
— Null hypothesis: new drug is worse by more than Δ.
— Reject null when CI upper bound is below Δ.
— Asks: "Is the new drug not unacceptably worse?"
— One-sided α = 0.025 conventional.
— Two null hypotheses: new drug differs from comparator by more than ±Δ in either direction.
— Reject both when entire CI lies within (−Δ, +Δ) — two one-sided tests (TOST).
— Asks: "Are the two drugs essentially the same?"
— Pre-specified rules for modifying sample size, arm allocation, or hypotheses based on interim data.
— Can be adaptive within NI framework (e.g., sample size re-estimation).
— Explanatory (efficacy): tightly controlled, ideal conditions — what NI trials usually are.
— Pragmatic (effectiveness): real-world conditions, broad inclusion — NI claims may not survive pragmatic re-testing.
— Units of randomization are sites/practices, not individuals. NI claims possible but require adjustment for intraclass correlation.
— Each patient receives both interventions in sequence. Useful for stable chronic conditions. NI analysis must account for period and carryover effects.

— Cohort or registry analyses comparing treatments using real-world data.
— Cannot pre-specify an NI margin in the same regulatory sense; subject to confounding by indication.
— Useful for post-marketing NI confirmation (target trial emulation, propensity scoring).
— Pool data across trials, including NI trials, to estimate comparative effects.
— NMA can produce indirect NI comparisons when head-to-head trials are absent — but only if comparators link a connected network.
— Use prior probability distributions and update with trial data.
— Posterior probability that "new drug's effect is within margin Δ" replaces frequentist CI bounds.
— Increasingly accepted by FDA for medical devices and rare diseases.
— FDA may use RWE to support new indications for already-approved drugs.
— RWE-based NI comparisons remain methodologically immature for primary approval but are accepted for label expansions.
— Used in rare diseases (oncology with unmet need).
— Cannot test NI formally; framed as response-rate benchmarks against historical comparator data.
— Interrupted time series, regression discontinuity — not used for drug NI claims.
— PCORI-funded examples comparing strategies in real practice.
— May use NI framing for treatment de-escalation studies.

— Convenience (once-daily DOAC for a non-adherent patient on warfarin).
— Cost (generic switch when copay is prohibitive).
— Safety (lower bleeding risk DOAC in an elderly patient with high INR variability).
— Drug interactions (DOAC over warfarin in a patient on multiple CYP inhibitors).
— Patient is well-controlled on the comparator with no incentive to switch.
— Patient population differs from trial enrollees (severe CKD, mechanical valve — note DOACs are contraindicated in mechanical valves despite NI vs warfarin in AF).
— Cost of switching exceeds benefit (transient insurance, brand-only).
— Explain that "non-inferior" means similar, not necessarily better.
— Discuss the magnitude of margin Δ (e.g., "could be up to 25% less effective at preventing stroke, but on average similar").
— Discuss the secondary benefits driving substitution (no INR monitoring, faster onset).
— Document patient preference, especially for cost-driven switches.
— DOACs are now first-line for non-valvular AF (AHA/ACC/HRS 2023) — moved from NI-based approval to preferred status due to safety advantages.
— Direct PCSK9-Q vs statin de-escalation studies use NI framing for lipid endpoints.
— Shorter DAPT duration (1–3 months) after PCI in selected patients — NI vs 12 months for ischemic endpoints, superior for bleeding.

— DOACs for AF (after warfarin): Renal function annually (more often if CrCl 30–50 or age >75), CBC and LFTs annually, bleeding assessment at every visit. No routine drug-level monitoring.
— Biosimilars: Clinical response monitoring at usual disease-specific intervals; immunogenicity testing only if loss of response.
— Generic substitution of narrow-therapeutic-index drugs: Recheck TSH 6 weeks after levothyroxine generic switch; recheck INR 1 week after warfarin generic switch; recheck phenytoin level 2 weeks after switch.
— "This medication has been shown to work similarly to the previous one, with the advantage of [X]."
— Avoid "just as good" if the margin was wide — say "comparable within an accepted range."
— Set expectations for adverse effects, which may differ from comparator (e.g., GI bleed risk profile of dabigatran vs warfarin).
— NI trials often have higher adherence than real-world use; emphasize that real-world effectiveness depends on adherence — particularly important for once-daily vs twice-daily DOAC choice (rivaroxaban once daily, apixaban twice daily).
— Hospital discharge: new drug substituted without primary care notification → duplicate therapy.
— Insurance formulary changes: patient switched mid-therapy without lab recheck.
— Specialty handoff: anticoagulant chosen by cardiology may not match neurology's preference in stroke prevention.

— When an effective therapy exists, randomizing patients to placebo violates clinical equipoise and the Declaration of Helsinki. Active-comparator NI design preserves ethical standards while permitting new drug evaluation.
— Must explain that the new drug may be less effective than standard care, by an amount up to Δ.
— Must explain the rationale for accepting potential lesser efficacy (safety, convenience, cost).
— Patients in NI trials are not guaranteed equivalent treatment — they accept this risk.
— Special vulnerability when comparator's effect is large (e.g., anticoagulation in mechanical valves — RE-ALIGN trial of dabigatran in mechanical valves stopped early for harm, showing NI trials can endanger patients when assay sensitivity fails).
— Post-marketing surveillance is mandatory. FDA may require Phase IV studies, REMS programs, or registries.
— Underpowered safety analyses in NI trials are a recognized safety risk — clinicians must report adverse events to MedWatch.
— A patient on stable warfarin admitted to the hospital and switched to a DOAC without checking renal function, weight, age, or drug interactions → bleeding or thrombosis on discharge.
— Mitigation: medication reconciliation at every transition, with explicit indication, renal dose, and follow-up plan.
— Off-label use of an NI-approved drug in a population not enrolled in trials (e.g., DOACs in mechanical valves) is contraindicated based on subsequent harm trials.
— Pharmacist substitution of biosimilars requires "interchangeable" designation; otherwise prescriber consent is required.


"A trial of drug X vs drug Y reports HR 1.02 (95% CI 0.88–1.20) for the primary endpoint with a pre-specified non-inferiority margin of 1.30. Which is the most accurate conclusion?"
— Answer: Non-inferiority demonstrated; superiority not demonstrated (CI crosses 1.0 but upper bound 1.20 < 1.30).
"In an NI trial of a new antibiotic for community-acquired pneumonia, the margin was set at 10% absolute risk difference. This margin reflects:"
— Answer: The maximum clinically acceptable loss of efficacy relative to standard care, derived from historical placebo-controlled data (M1) preserving at least 50%.
"In a non-inferiority trial, which analysis is generally preferred to avoid falsely concluding non-inferiority?"
— Answer: Per-protocol analysis (ITT can falsely support NI when non-adherence dilutes differences).
"Why was the trial designed with an active comparator rather than placebo?"
— Answer: Because withholding effective therapy violates clinical equipoise / informed consent / Helsinki principles.
"The 95% CI upper bound (1.42) exceeded the prespecified margin (1.30). The most accurate conclusion is:"
— Answer: Non-inferiority was not demonstrated. (NOT: drug is inferior — that requires CI lower bound > null.)
"A trial pre-specified as superiority failed to demonstrate superiority but the CI fell within a margin of 1.25. Can the sponsor claim NI?"
— Answer: No, unless an NI margin was pre-specified before unblinding.
"A generic drug's 90% CI for AUC ratio is 0.82–1.18. The reference is 0.80–1.25."
— Answer: Bioequivalence demonstrated.
"Each successive NI trial used the previous winner as comparator. The accumulating risk is:"
— Answer: Biocreep — drift toward placebo-equivalent efficacy.

Non-inferiority and equivalence trials test whether a new intervention is, respectively, "not unacceptably worse" or "essentially the same" as an active comparator, by comparing the confidence interval of the effect estimate to a pre-specified clinically justified margin (Δ).

