Biostatistics & Population Health

Sensitivity analysis in cost-effectiveness studies

Clinical Overview and When to Suspect Fragile Cost-Effectiveness Estimates

— Inputs derived from small trials, expert opinion, or extrapolated long-term outcomes

— ICER falls near a decision threshold (e.g., $95k/QALY when WTP is $100k)

— Model relies on surrogate endpoints translated into QALYs

— Industry-sponsored analyses where parameter selection may favor a product

— Long time horizons (lifetime models) where discount rate choice dominates

Board pearl: A cost-effectiveness conclusion that does not survive a reasonable one-way or probabilistic sensitivity analysis should be considered hypothesis-generating only — analogous to a non-significant trend in a clinical trial. SA is the CEA equivalent of asking, "Would my decision change if my assumptions were wrong?"

Sensitivity analysis (SA) is the formal method used in cost-effectiveness analyses (CEA) to test how robust a conclusion — typically an incremental cost-effectiveness ratio (ICER) — is to uncertainty in the underlying inputs.

A CEA produces an ICER: (Cost_A − Cost_B) / (QALY_A − QALY_B), expressed as $/QALY gained. In the US, willingness-to-pay (WTP) thresholds of $50,000–$150,000/QALY are commonly cited benchmarks.

Every CEA input — drug cost, adherence rate, transition probability, utility weight, discount rate, time horizon — carries uncertainty. Without SA, a single point-estimate ICER is essentially unvalidated.

When to "suspect" that SA is essential:

Step 3 management equivalent: treat SA the way you treat a confidence interval around a hazard ratio — if the CI crosses 1, the result is fragile. If SA shows the ICER crossing the WTP threshold under plausible inputs, the cost-effectiveness conclusion is fragile.

Common Step 3 framing: an exam vignette describes a new biologic costing $80,000/year with a base-case ICER of $90,000/QALY. The question then asks what additional analysis best informs the payer decision — the answer is probabilistic sensitivity analysis, not another RCT.

Presentation Patterns and Key History — Types of Uncertainty

— "The authors varied the drug cost from $40k to $120k" → one-way SA of parameter uncertainty

— "They re-ran the analysis assuming a 3-state vs 5-state Markov model" → structural/scenario analysis

— "They changed the perspective from healthcare sector to societal" → methodological analysis

— "10,000 Monte Carlo iterations sampling from input distributions" → probabilistic SA

Board pearl: If the stem mentions "Monte Carlo," "Dirichlet/beta/gamma distributions," or "cost-effectiveness acceptability curve," the technique is probabilistic sensitivity analysis (PSA). If it mentions "tornado diagram," it is deterministic one-way SA. These visual cues are the fastest identifiers on exam day.

CEA uncertainty falls into four categories, and Step 3 questions often hinge on naming which type a given scenario represents.

Parameter uncertainty — imprecision in input values (e.g., 95% CI around a relative risk, range of drug prices). This is the type SA most directly addresses.

Structural (model) uncertainty — uncertainty about model architecture itself (Markov vs decision tree, number of health states, inclusion of indirect costs). Addressed by scenario analysis, not classical SA.

Methodological uncertainty — choice of perspective (payer vs societal), discount rate, time horizon, outcome measure (QALY vs DALY vs life-years). Addressed by reference-case analysis per ISPOR/Second Panel recommendations.

Stochastic (first-order) uncertainty — random variation between individuals in a microsimulation. Distinct from parameter uncertainty.

Key distinction: Parameter uncertainty asks "What if the input number is wrong?" Structural uncertainty asks "What if the model itself is wrong?" Probabilistic SA handles the first; scenario analyses handle the second.

History clues in a vignette:

Second Panel on Cost-Effectiveness in Health and Medicine (2016) recommends reporting both healthcare sector and societal perspectives as a reference case — a frequent Step 3-style methodological question.

Physical Exam Findings — Recognizing SA Outputs on Sight

Step 3 management: When shown a tornado diagram with drug cost as the widest bar, the appropriate next step is often price negotiation or a value-based contract, not additional clinical trials — because clinical efficacy is not the dominant uncertainty. Recognizing the driver of uncertainty directs the right downstream action.

Just as you recognize a murmur by its location and timing, recognize SA by its characteristic graphical and tabular outputs.

Tornado diagram — horizontal bar chart, widest bars on top, each bar showing how the ICER changes when one parameter is varied across its plausible range while others remain fixed. The widest bar = most influential parameter. This is the signature output of one-way deterministic SA.

Two-way sensitivity analysis — typically a 2D plot with two parameters on the axes and shaded regions indicating which strategy is cost-effective in each combination. Used when two inputs interact (e.g., drug cost × treatment duration).

Cost-effectiveness plane — scatterplot with incremental cost on the y-axis and incremental QALYs on the x-axis. Each dot is one PSA iteration. Clustering in the NE quadrant = more costly + more effective (typical for new therapies); the WTP threshold appears as a diagonal line.

Cost-effectiveness acceptability curve (CEAC) — x-axis = WTP threshold ($/QALY), y-axis = probability that the intervention is cost-effective. A CEAC that reaches 80% at $100k/QALY means: under PSA, 80% of simulations found the intervention cost-effective at that threshold.

Net monetary benefit (NMB) plot — NMB = (QALYs × WTP) − Cost. Positive NMB favors the intervention. Plotting NMB vs WTP linearizes the decision.

Hemodynamic analog: the tornado diagram is your "vital signs at a glance" — it tells you immediately which parameter is driving instability in the model.

Diagnostic Workup — One-Way and Multi-Way Deterministic SA

— Transparent, easy to communicate to non-statisticians

— Identifies which parameters most influence the result

— Quickly reveals threshold values for negotiation

— Ignores joint uncertainty (correlations between parameters)

— Cannot produce probability statements ("80% chance of being cost-effective")

— Best-case/worst-case scenarios may be implausibly extreme

Board pearl: If a question asks which parameter the analyst should target with a future trial to reduce decision uncertainty, the answer comes from the tornado diagram's widest bar — that is the parameter where additional evidence has the most decision-relevant value (related to expected value of perfect information, EVPI).

One-way (univariate) deterministic SA — vary one parameter at a time across a plausible range (often 95% CI or ±25%) while holding all others at base-case. Outputs: tornado diagram, threshold values.

Threshold analysis — a special one-way SA asking: "At what value of this parameter does the ICER cross the WTP threshold?" Example: "The intervention becomes cost-effective at $100k/QALY when annual drug cost falls below $62,400." This is directly actionable for payers.

Two-way SA — vary two parameters simultaneously; useful when parameters covary clinically (e.g., adherence and efficacy).

Multi-way / scenario SA — vary several parameters together to represent a plausible "scenario" (best-case, worst-case, real-world adherence scenario). Distinct from PSA because it uses fixed combinations, not random sampling.

Strengths of deterministic SA:

Limitations:

Key distinction: Deterministic SA answers "How sensitive is the ICER to each input?" Probabilistic SA answers "What is the probability the intervention is cost-effective given all uncertainty simultaneously?" Both are typically reported; modern guidelines (ISPOR, NICE, ICER-US) require PSA for major submissions.

Diagnostic Workup — Probabilistic Sensitivity Analysis (PSA)

— Beta distribution — probabilities and utilities (bounded 0–1)

— Gamma or log-normal distribution — costs and resource use (right-skewed, ≥0)

— Log-normal — relative risks, hazard ratios, odds ratios

— Dirichlet distribution — transition probabilities across multiple Markov states (multivariate extension of beta)

— Normal — log-transformed regression coefficients

— Mean and 95% credible interval for the ICER

— Probability cost-effective at multiple WTP thresholds

— CEAC for the full WTP range

Board pearl: PSA does not address structural uncertainty — if the underlying Markov model is wrong (e.g., wrong health states), PSA will produce precise-looking results that are still biased. PSA is precision around a possibly-misspecified mean.

PSA (second-order Monte Carlo simulation) assigns a probability distribution to each uncertain parameter, then randomly samples from all distributions simultaneously across thousands of iterations (typically 1,000–10,000), generating a distribution of ICERs.

Standard distribution choices (these appear on exams):

Workflow: define distributions → sample once per parameter per iteration → compute ICER → repeat → summarize as cost-effectiveness plane, CEAC, and mean NMB.

Outputs:

Correlations matter: if cost and effectiveness are correlated (often they are — sicker patients cost more and gain more QALYs from effective therapy), PSA should preserve that correlation via joint sampling (e.g., Cholesky decomposition) or bootstrapping of patient-level data.

Step 3 management: A vignette describing "10,000 simulations drawing drug efficacy from a beta distribution and costs from a gamma distribution, plotted on a cost-effectiveness plane" is unambiguously PSA. Expected answer: this analysis quantifies joint parameter uncertainty and supports probabilistic decision statements.

Risk Stratification — Interpreting CEACs and Decision Thresholds

— x-axis: willingness-to-pay threshold ($/QALY)

— y-axis: probability the intervention is cost-effective at that threshold

— At WTP = $0, the curve shows P(intervention is cost-saving)

— At very high WTP, the curve approaches the probability the intervention is simply more effective

— Strong: dominant strategy (less costly + more effective) or CEAC >95% at WTP

— Moderate: ICER well below WTP, CEAC 70–95%

— Fragile: ICER near WTP, CEAC 40–70%, wide credible interval

— Reject or revise: dominated strategy or CEAC <40%

Key distinction: A favorable point-estimate ICER with a flat, low CEAC = the result is statistically fragile. A less favorable ICER with a steep CEAC reaching 95% quickly = robust conclusion. Always favor robustness over point-estimate optimism — analogous to preferring a tight CI to a flashy effect size.

A cost-effectiveness acceptability curve (CEAC) is the most exam-tested PSA output. Read it as follows:

Decision rule: at the relevant WTP (e.g., $100,000/QALY in the US), if the CEAC is above ~80–95%, the intervention is considered confidently cost-effective. Between 50–80%, decision uncertainty is meaningful — additional research may be warranted.

Cost-effectiveness acceptability frontier (CEAF) — among multiple competing strategies, shows the probability that the optimal strategy (highest expected NMB) is cost-effective at each WTP. Distinct from CEAC, which can mislead when >2 strategies compete.

Expected value of perfect information (EVPI) — monetary value of eliminating all parameter uncertainty. High EVPI at the relevant WTP = further research is worth funding. Per-parameter EVPI (EVPPI) identifies which specific inputs most warrant new studies.

Risk stratification of CEA conclusions:

Pharmacotherapy Analog — Choosing the Right SA "Regimen"

— Deterministic one-way SA on all key parameters → tornado diagram

— Scenario analyses for structural/methodological choices (perspective, time horizon, discount rate)

— Probabilistic SA with ≥1,000 iterations → CEAC + cost-effectiveness plane

— Threshold analyses for policy-relevant inputs (drug price, adherence)

— Tornado diagram (parameter influence)

— PSA with CEAC (probabilistic robustness)

— Scenario for societal perspective (policy relevance)

— Discount rate sensitivity (0%, 3%, 5%)

— Disclosure of funding source and model availability

Board pearl: A CEA that reports only a point-estimate ICER without PSA should be treated like a clinical trial reporting only point-estimate efficacy without confidence intervals — methodologically incomplete and not suitable for guideline-grade evidence.

Just as drug regimens are layered, modern CEAs layer multiple SA techniques. The "first-line regimen" per ISPOR Good Research Practices and the Second Panel:

Discount rate convention (US): 3% per year for both costs and QALYs in the reference case, with sensitivity at 0% and 5% (or 1.5% for very long horizons per some European bodies). NICE uses 3.5%.

Time horizon SA: lifetime horizon is standard for chronic disease; sensitivity analyses at 5, 10, and 20 years assess how much of the benefit accrues late.

Perspective SA: reference-case analysis reports healthcare sector and societal perspectives separately (Impact Inventory per Second Panel) — productivity losses and informal caregiving included only in societal view.

Half-cycle correction in Markov models — a structural detail that should be tested in SA, because omitting it biases short-horizon ICERs.

Step 3 management: When evaluating a published CEA for a clinical guideline committee, demand:

Procedures — Building a Tornado Diagram and Running PSA Step-by-Step

— Step 1: List all model parameters with point estimate, plausible range, and distribution. Cost inputs → gamma; probabilities → beta; HRs → log-normal.

— Step 2: For one-way SA, vary each parameter across its 95% CI (or ±25–50% if no CI available), recompute ICER, record min/max.

— Step 3: Rank parameters by absolute change in ICER → plot horizontal bars sorted widest-to-narrowest → tornado diagram.

— Step 4: For threshold analysis on key parameters (e.g., drug cost), solve for the value at which ICER = WTP.

— Step 5: For PSA, sample once from every parameter's distribution per iteration; run 1,000–10,000 iterations. Preserve correlations where biologically/economically plausible.

— Step 6: Plot all (ΔCost, ΔQALY) pairs on the cost-effectiveness plane. Overlay WTP line.

— Step 7: Compute probability the intervention is cost-effective at each WTP → CEAC.

— Step 8: Calculate EVPI; if high, consider EVPPI to identify which parameters warrant further research.

— Convergence: do mean ICER and CEAC stabilize as iterations increase?

— Face validity: do sampled inputs produce clinically plausible outcomes?

— Internal consistency: do probabilities sum to 1 within each Markov cycle?

CCS pearl: If a question asks the next step after observing that the tornado diagram is dominated by one parameter, the answer is usually probabilistic SA with focused attention on that parameter's distribution, or value-of-information analysis (EVPPI) to decide whether a new study is warranted before adopting the intervention.

Procedural workflow (CCS-style "what do you order, then what"):

Quality checks during the procedure:

Documentation: report assumptions, distributions, software (TreeAge, R, Excel), and model code per CHEERS 2022 checklist — the cost-effectiveness analog of CONSORT.

Special Populations — Subgroup and Heterogeneity Analyses

— An intervention may be cost-effective on average but not in low-risk subgroups (low baseline event rate → small absolute benefit → high ICER).

— Conversely, it may be highly cost-effective in high-risk subgroups even when the population average is borderline.

— Example: statins for primary prevention — cost-effective at 10-year ASCVD risk ≥7.5%, less so below.

Key distinction: Pooling subgroups into a single ICER can mask clinically and ethically important variation. A vignette in which a therapy is "cost-effective overall but ICER is $400k/QALY in patients ≥75" should trigger age-stratified policy, not blanket adoption — a recurring Step 3 systems-of-care theme.

Heterogeneity ≠ uncertainty. Heterogeneity is real, explainable variation between patient subgroups (age, comorbidity, genotype). Uncertainty is imprecision in our knowledge. SA addresses uncertainty; subgroup analysis addresses heterogeneity.

Why this matters in CEA:

Approach: stratify the model by clinically meaningful subgroups, re-run full SA in each. Report subgroup-specific ICERs and CEACs.

Renal/hepatic impairment analog: drug clearance affects dosing and toxicity, which alter both cost and QALY inputs. A CEA in the general population may not apply to CKD stage 4–5 patients; sensitivity analysis around renal-specific transition probabilities is essential.

Elderly populations: lifetime horizon shrinks → discounting matters less, but utility decrements from competing comorbidities matter more. SA should vary background mortality and age-specific utility weights.

Equity considerations: the Second Panel recommends distributional cost-effectiveness analysis (DCEA) as a sensitivity-style extension that examines how costs and benefits accrue across socioeconomic or racial subgroups.

Special Populations — Pregnancy, Pediatrics, and Long-Horizon SA Challenges

— Lifetime horizons of 70+ years amplify the impact of the discount rate — differential discounting (e.g., 3% costs, 1.5% QALYs) is sometimes used and should be tested in SA.

— Pediatric utility weights are difficult to elicit; sensitivity around proxy-reported utilities is essential.

— Long-term effects of childhood interventions (vaccines, screening) compound — small per-cycle errors balloon over decades.

— Dual-patient framework: maternal and fetal/neonatal outcomes must both feed the QALY calculation. SA should include scenarios with maternal-only and combined outcomes.

— Short time horizons for acute interventions (e.g., antenatal corticosteroids) reduce discount-rate sensitivity but increase sensitivity to event probability inputs.

— Herd immunity assumptions (structural)

— Vaccine efficacy waning curve (parameter)

— Discount rate (methodological)

— Disease incidence (parameter, often from surveillance with wide CIs)

— Small trials → wide parameter CIs → PSA credible intervals span orders of magnitude.

— Some payers apply modified WTP thresholds for ultra-rare conditions (e.g., NICE's higher threshold for end-of-life and rare disease therapies); SA should test multiple thresholds.

Step 3 management: For an orphan drug CEA with a base-case ICER of $500,000/QALY but PSA showing 30% probability of being cost-effective at a $300,000/QALY rare-disease threshold, the appropriate framing is "decision under substantial uncertainty" — supporting conditional coverage, coverage with evidence development, or value-based pricing, rather than outright adoption or rejection.

Pediatric CEAs pose unique SA challenges:

Pregnancy and perinatal CEAs:

Vaccine CEAs — classic long-horizon model. Major SA targets:

Rare disease / orphan drug CEAs:

Complications — Common Errors and Misleading SA

Board pearl: When a CEA's conclusion changes under reasonable variation of a single non-clinical parameter (e.g., assumed drug price), the conclusion is not robust — treat it like a clinical trial whose result depends on excluding one outlier patient.

Spurious precision: reporting an ICER of $94,372/QALY suggests false accuracy when PSA credible intervals span $40k–$210k. Always report ICER with its credible interval.

Cherry-picked ranges: narrowing parameter ranges in one-way SA to artificially stabilize the ICER. Detect by comparing reported ranges to published 95% CIs.

Ignoring correlation in PSA: sampling cost and effectiveness independently when they covary inflates the apparent uncertainty in some quadrants and falsely narrows it in others.

Wrong distribution choice: assigning a normal distribution to a probability that can fall below 0 or above 1, or to a cost that cannot be negative. Always use beta for probabilities, gamma/log-normal for costs.

Structural rigidity: running PSA on parameters but not testing alternate model structures — leaving structural uncertainty unaddressed.

Perspective mismatch: reporting only the manufacturer-favorable perspective (e.g., societal when productivity gains help, healthcare sector when they don't). Reference case requires both.

Time-horizon truncation: using a 5-year horizon for a chronic disease intervention to avoid discount-rate scrutiny — biases against preventive therapies whose benefits accrue late.

"Switching" without dominance check: in multi-strategy analyses, failing to apply extended dominance (eliminating strategies dominated by a mix of other strategies) inflates apparent ICERs.

Publication and funding bias: industry-sponsored CEAs are 2–4× more likely to report favorable ICERs than independent analyses (multiple meta-analyses). SA cannot fully correct for this, but transparent reporting per CHEERS and independent model replication are mitigations.

When to Escalate — Value of Information Analysis

— If EVPI > cost of a definitive trial → trial is worth conducting.

— If EVPI < trial cost → adopt current best estimate; further research not value-generating.

— SA shows decision uncertainty → calculate EVPI

— EVPI > expected trial cost → calculate EVPPI to localize uncertainty

— EVPPI identifies key parameter(s) → calculate EVSI for proposed study

— Design and fund the study accordingly

Step 3 management: When asked the most appropriate next step after a CEA reveals a 50/50 probability of cost-effectiveness at the WTP threshold, the right answer is rarely "adopt" or "reject" — it is typically conduct a value-of-information analysis to determine whether additional research is justified, or conditional coverage with mandated data collection.

Value of information (VOI) analysis is the formal escalation pathway when SA reveals substantial decision uncertainty.

Expected value of perfect information (EVPI) — population-level monetary value of eliminating all uncertainty. Calculated as the expected loss from making the wrong decision under current uncertainty, multiplied by affected population and time horizon.

Expected value of partial perfect information (EVPPI) — VOI for individual parameters or groups. Identifies which parameters most warrant additional research.

Expected value of sample information (EVSI) — VOI for a specific proposed study design (e.g., n=500 RCT). Most actionable: directly informs trial sizing.

Escalation pathway (CCS-style):

Policy escalation: when decision uncertainty is high but immediate access is clinically important (e.g., rare cancer therapies), agencies may use coverage with evidence development (CED) — conditional reimbursement contingent on prospective data collection. CMS uses an analog called Coverage with Evidence Development (CED) under the National Coverage Determination process.

Key Differentials — Same-Category Economic Analyses

Board pearl: When a vignette describes Monte Carlo sampling and QALYs, think CUA with PSA. When it describes 5-year total spending projections with uptake scenarios, think BIA with scenario analysis. The SA tools differ accordingly.

Several analytic frameworks sit alongside CEA; each uses sensitivity analysis differently.

Cost-minimization analysis (CMA) — assumes equivalent effectiveness; compares costs only. SA focuses on cost inputs. Caveat: valid only when effectiveness equivalence is genuinely established (often it is not, making CMA frequently inappropriate).

Cost-effectiveness analysis (CEA) — outcomes in natural units (life-years gained, cases prevented). SA on costs and on the natural outcome.

Cost-utility analysis (CUA) — outcomes in QALYs; the most common type used colloquially as "CEA." SA additionally addresses utility weights, often via beta distributions.

Cost-benefit analysis (CBA) — both costs and outcomes in dollars (via willingness-to-pay valuations of health). SA on monetization assumptions, which are often the dominant uncertainty.

Budget impact analysis (BIA) — projects total spending over 1–5 years given adoption. Complementary to CEA. SA focuses on uptake rate, market share, and unit costs — short-horizon, finance-driven inputs.

Distributional cost-effectiveness analysis (DCEA) — incorporates equity weights. SA on the equity weight itself (how strongly society values health gains in disadvantaged groups).

Key distinction: CEA/CUA answer "Is it worth it?" BIA answers "Can we afford it?" Both are usually required for payer decisions. A drug can be cost-effective ($90k/QALY) yet have a prohibitive budget impact (e.g., hepatitis C DAAs in 2014). SA must address both axes.

Key Differentials — Other-Category Concepts Confused with SA

— Calibration = tuning model parameters so predicted outcomes match observed data

— Validation = checking model predictions against an external dataset

— Neither is sensitivity analysis, but both are prerequisites for credible SA.

Key distinction: "Sensitivity" in "sensitivity and specificity" describes test performance; "sensitivity" in "sensitivity analysis" describes model robustness. On Step 3, the context (diagnostic vignette vs cost-effectiveness vignette) disambiguates — but the trap is real.

Sensitivity (epidemiologic) — true positive rate of a diagnostic test. Completely unrelated to sensitivity analysis despite the shared name; exam questions exploit this confusion.

Confidence interval analysis — quantifies sampling uncertainty around a single estimate (e.g., a hazard ratio). PSA generalizes this concept across many parameters simultaneously.

Subgroup analysis — explores heterogeneity, not uncertainty. Pre-specified subgroup analyses in RCTs are analogous to heterogeneity analyses in CEA.

Meta-analysis / Bayesian updating — synthesizes evidence; the posterior distribution from a meta-analysis is often the input to PSA.

Robustness checks in regression — analogous in spirit (varying model specifications) but applied to statistical models, not decision models.

Power analysis — sample size calculation for trials. Connects to SA via EVSI, which essentially asks "what sample size produces enough information value to justify the study?"

Falsifiability / replicability — model code sharing and independent replication are the CEA analog of trial registration and data sharing.

Calibration vs validation:

Real-world evidence (RWE) — increasingly used to populate CEA inputs; SA should test whether RWE-derived estimates produce different ICERs than RCT-derived estimates.

Long-Term Plan — Reporting Standards and Reproducibility

— Item 20: Characterizing heterogeneity (subgroup analyses)

— Item 21: Characterizing distributional effects (DCEA)

— Item 22: Characterizing uncertainty (deterministic and probabilistic SA)

— Item 23: Approach to engagement with patients, stakeholders, and funders

— Healthcare sector AND societal perspectives

— Lifetime time horizon for chronic conditions

— 3% discount rate (with 0% and 5% in SA)

— Impact inventory specifying included/excluded costs

— PSA with CEAC

— Full parameter table with distributions

— Model structure diagram

— Software and version

— Ideally, public code repository (GitHub, OSF)

— Conditional coverage / managed entry agreements

— Risk-sharing contracts (manufacturer refunds if real-world outcomes underperform)

— Outcome-based pricing

— Mandated post-marketing surveillance

— Periodic re-analysis as new evidence emerges

Step 3 management: For a value-based contract built around a CEA with substantial PSA uncertainty, the appropriate long-term plan includes pre-specified re-analysis triggers (e.g., re-run CEA after 3 years of registry data) — mirroring the post-marketing surveillance pathway for drug safety.

CHEERS 2022 (Consolidated Health Economic Evaluation Reporting Standards) is the CONSORT-equivalent checklist for CEAs. Required elements directly relevant to SA:

ISPOR Good Research Practices — series of task force reports detailing best practices for model conceptualization, parameter estimation, and SA (including PSA, VOI).

Second Panel on Cost-Effectiveness in Health and Medicine (2016) — reference-case requirements:

Reproducibility expectations:

Long-term policy plan when SA reveals uncertainty:

Follow-Up and Monitoring — Re-running SA Over Time

— Drug acquisition cost (often falls with biosimilars/generics — a 30–80% drop can flip a non-cost-effective therapy to cost-saving)

— Real-world adherence (typically lower than trial adherence → reduces real-world cost-effectiveness)

— Long-term effectiveness (durability of treatment effect beyond trial duration)

— Comparator landscape (a new competitor changes the relevant ICER)

— Quality-of-life evidence (more mature utility data may shift QALY estimates)

— Major drug class: every 3–5 years or upon biosimilar entry

— Rapidly evolving fields (oncology, gene therapy): every 1–2 years

— Stable interventions (vaccines, screening): every 5–10 years or upon major epidemiologic change

— Translate ICERs and CEACs into plain language for clinicians and policymakers ("at $100k/QALY, 78% likely to be cost-effective")

— Distinguish economic from clinical recommendations — a non-cost-effective therapy may still be clinically appropriate; the CEA informs resource allocation, not patient-level decisions in most US contexts.

CCS pearl: When biosimilar entry is announced for a high-cost biologic, the appropriate "next order" is to re-run the cost-effectiveness model with updated price inputs and full PSA — frequently flipping a borderline therapy into a cost-effective range and triggering coverage policy updates within months.

CEA results are not static. Inputs change as new trials are published, drug prices renegotiate, and real-world outcomes diverge from trial estimates. Periodic re-analysis is the SA analog of clinical follow-up.

Monitoring parameters:

Re-analysis cadence:

Counseling and communication:

Quality improvement loop: track decisions informed by CEAs and outcomes downstream — the systems analog of clinical audit.

Ethical, Legal, and Patient Safety Considerations

— Disabled patients (lower baseline utility → smaller absolute QALY gains)

— Elderly patients (shorter remaining life-years)

— Patients with severe chronic illness

— In response, the Affordable Care Act (Section 1182) prohibits CMS/PCORI from using QALYs to deny or limit Medicare coverage — a uniquely US legal constraint that distinguishes our system from NICE (UK) or CADTH (Canada). SA on alternative metrics (equal-value life-years, healthy-years equivalents) is increasingly required for US policy use.

Board pearl: A Step 3 stem describing a US Medicare coverage decision invoking a $/QALY threshold should trigger recognition that CMS cannot legally use QALYs as the sole basis for coverage denial — a frequent ethics/health-policy test point.

QALY-based decisions raise equity concerns. QALYs may systematically undervalue interventions for:

Conflict of interest: industry-sponsored CEAs routinely report more favorable ICERs. Mandatory disclosure per CHEERS and journal policy is the minimum; independent replication is the gold standard.

Informed consent edge case: when a CEA influences which therapy is offered (e.g., formulary restriction), patients should be informed that economic considerations entered the decision, especially when a more effective but non-cost-effective option exists. This is a transition-of-care risk: patients moving between insurers may discover coverage differences not previously disclosed.

Patient safety: suppression of negative CEA results or selective reporting of favorable scenarios is an integrity issue analogous to selective outcome reporting in trials. Pre-registration of CEA protocols (analogous to ClinicalTrials.gov) is an emerging norm.

Data privacy: patient-level data feeding CEAs (claims, EHR) must comply with HIPAA; aggregated cost data from health systems may carry proprietary restrictions limiting transparency.

Equity-informed SA: report subgroup ICERs by socioeconomic and racial subgroups; report DCEA as a sensitivity scenario.

High-Yield Associations and Rapid-Fire Clinical Facts

— Strong dominance: more effective AND less costly → adopt

— Weak/extended dominance: dominated by a linear combination of other strategies → eliminate

Key distinction: Cost-effective ≠ affordable ≠ clinically appropriate. Each axis requires its own analysis, and SA applies to each.

WTP thresholds: US informal $50k–$150k/QALY; UK NICE £20–30k/QALY (~$25–38k); WHO historical 1× and 3× GDP per capita (now deprecated).

Discount rate: US reference case 3% for both costs and QALYs; SA at 0% and 5%.

Standard distributions: beta (probabilities, utilities), gamma/log-normal (costs), log-normal (relative risks), Dirichlet (multi-state transition probabilities).

Tornado diagram = one-way deterministic SA visualization; CEAC = PSA visualization across WTP range.

PSA iterations: typically 1,000–10,000; convergence should be demonstrated.

Dominance:

ICER interpretation pitfalls: ICER is undefined when ΔQALY = 0; ICER in the southwest quadrant (less costly, less effective) is interpreted differently from the northeast — context-dependent.

EVPI quantifies the value of eliminating all uncertainty; EVPPI localizes to specific parameters; EVSI sizes a specific study.

CHEERS 2022 = reporting standard; ISPOR = methods guidance; Second Panel (2016) = US reference case; ICER (Institute for Clinical and Economic Review) = independent US value assessor.

CMS legal constraint: ACA Section 1182 prohibits Medicare use of QALYs for coverage denial.

Industry-sponsored CEAs: 2–4× more likely to report favorable ICERs.

Biosimilar / generic entry: the single most reliable trigger for a previously non-cost-effective therapy to flip cost-effective.

Hepatitis C DAA precedent (2014): cost-effective ($/QALY) but unaffordable (BIA) → drove modern attention to budget impact alongside CEA.

Board Question Stem Patterns

Board pearl: Always identify (1) which type of SA is described, (2) what uncertainty type it addresses, and (3) whether the conclusion is robust at the relevant US WTP threshold — three checks that solve most Step 3 SA stems.

Pattern 1 — Identify the SA type: "Investigators varied drug cost from $20k to $100k and re-plotted the ICER on a tornado diagram." → One-way deterministic SA. Best answer addresses parameter influence.

Pattern 2 — Identify PSA: "10,000 Monte Carlo simulations sampled drug efficacy from a beta distribution and costs from a gamma distribution, generating a CEAC showing 82% probability of cost-effectiveness at $100k/QALY." → PSA; conclusion is robust at the stated threshold.

Pattern 3 — Interpret a CEAC: "At WTP of $50k/QALY, the CEAC is 35%; at $150k, it is 88%." → Cost-effective at higher US thresholds but not lower ones; decision depends on threshold choice.

Pattern 4 — Distinguish from BIA: "The intervention has an ICER of $40k/QALY but would add $2.3 billion to annual Medicare spending." → Cost-effective but high budget impact; appropriate response involves phased implementation or price negotiation, not rejection on CEA grounds.

Pattern 5 — Equity/legal: "CMS proposes using a $100k/QALY threshold to deny coverage of a novel therapy primarily benefiting disabled adults." → Recognize ACA Section 1182 prohibition on QALY-based denials.

Pattern 6 — VOI escalation: "PSA shows 55% probability of cost-effectiveness at WTP $100k/QALY; EVPI is $480M nationally; a definitive trial would cost $40M." → Trial is value-generating; recommend it (or coverage with evidence development).

Pattern 7 — Distribution choice: "Which distribution best models a transition probability in a Markov model?" → Beta (single transition) or Dirichlet (multi-state).

Pattern 8 — Industry bias: "Manufacturer-sponsored CEA reports ICER of $48k/QALY; independent ICER analysis reports $112k/QALY." → Recognize systematic favorable bias; weight independent analysis more heavily.

One-Line Recap

Sensitivity analysis is the formal toolkit — one-way deterministic SA (tornado diagrams), scenario analysis, and probabilistic SA (Monte Carlo with CEACs) — used to test whether a cost-effectiveness analysis's ICER conclusion survives plausible variation in its inputs, model structure, and methodological choices, with value-of-information analysis as the escalation pathway when decision uncertainty remains substantial.

— Deterministic one-way SA answers "Which input most influences my ICER?" via the tornado diagram; PSA answers "What is the probability the intervention is cost-effective?" via the CEAC and cost-effectiveness plane. Both are required by modern reporting standards (CHEERS 2022, ISPOR, Second Panel).

— Use beta distributions for probabilities and utilities, gamma or log-normal for costs, log-normal for hazard/relative risks, and Dirichlet for multi-state transition probabilities; the US reference case uses a 3% discount rate with sensitivity at 0% and 5%, a lifetime horizon for chronic disease, and both healthcare-sector and societal perspectives.

— When PSA reveals substantial decision uncertainty at the relevant WTP threshold (commonly $50–150k/QALY in the US), the appropriate next step is value-of-information analysis (EVPI/EVPPI/EVSI) to determine whether further research is worth funding, often paired with coverage with evidence development as a policy bridge.

— Recognize the US legal context: ACA Section 1182 prohibits CMS from using QALY-based thresholds to deny Medicare coverage, making equity-informed analyses (DCEA, equal-value life-years) increasingly important; recognize that cost-effective ≠ affordable — budget impact analysis is a separate, complementary requirement, as the 2014 hepatitis C DAA experience demonstrated.

Board pearl: On Step 3, identify the SA type from visual/textual cues (tornado → one-way; Monte Carlo + CEAC → PSA), check robustness at the US WTP threshold, and escalate to VOI or conditional coverage when uncertainty remains — these three moves resolve nearly every sensitivity-analysis vignette you will encounter.

High-yield bullet recaps: