Biostatistics & Population Health

Pragmatic vs explanatory trial design

Clinical Overview and When to Suspect Pragmatic vs Explanatory Design Questions

— A vignette describes a "real-world" or "comparative effectiveness" study enrolling broad populations across community clinics → think pragmatic.

— A vignette describes tightly screened patients, placebo control, rigid protocol adherence, surrogate endpoints → think explanatory.

— A question asks why a drug that "worked in trials" is underperforming in your clinic population → the original trial was likely explanatory and lacked external validity.

— Step 3 emphasizes applying evidence to your actual patient in ambulatory and systems contexts. Pragmatic trial literacy is the bridge between guideline and panel.

— Quality improvement, value-based care, and population health items frequently reference pragmatic designs (e.g., cluster-randomized rollout of a care pathway).

— Explanatory: early phase III oncology RCTs, FDA registration trials.

— Pragmatic: ASCEND, SPRINT (partially), Salford Lung Study, ADAPTABLE (aspirin dosing).

Board pearl: If the trial uses electronic health record–based recruitment, minimal exclusion criteria, usual-care comparator, and patient-centered outcomes (mortality, hospitalization, QoL) — it is pragmatic, and its results generalize more readily to your panel than a tightly controlled explanatory RCT.

Core concept: Clinical trials exist on a spectrum from explanatory (efficacy under ideal conditions — "can it work?") to pragmatic (effectiveness under real-world conditions — "does it work in practice?"). Recognizing where a trial sits on this spectrum determines how you interpret and apply its results at the bedside.

When the Step 3 question is really a design question:

Why this matters in Step 3 specifically:

PRECIS-2 tool: Nine domains (eligibility, recruitment, setting, organization, flexibility-delivery, flexibility-adherence, follow-up, primary outcome, primary analysis) scored 1 (very explanatory) to 5 (very pragmatic). You don't memorize scoring — you recognize the axes.

Classic examples:

Presentation Patterns and Key History — Recognizing Design on the Exam

— "Patients aged 40–65 with isolated condition X, no comorbidities, eGFR >60, no concurrent medications..."

— Single academic medical center, placebo-controlled, double-blind.

— Outcome is a surrogate marker (HbA1c, LDL, blood pressure, tumor response) rather than patient-important endpoint.

— Run-in period to exclude non-adherent patients before randomization.

— Per-protocol analysis emphasized.

— Multiple community practices, broad inclusion ("adults with type 2 diabetes regardless of comorbidity").

— Comparator is usual care or an active standard agent, not placebo.

— Open-label or PROBE design (Prospective Randomized Open Blinded Endpoint).

— Outcomes ascertained via registries, claims, or EHR rather than dedicated study visits.

— Intention-to-treat analysis emphasized; cluster randomization at clinic or hospital level.

— Minimal protocol-mandated visits beyond usual care.

Key distinction: Efficacy ≠ effectiveness. Efficacy = performance in ideal/controlled conditions (explanatory). Effectiveness = performance in routine conditions (pragmatic). A drug can be efficacious but not effective if real-world adherence collapses (e.g., complex dosing, side effects driving discontinuation). Step 3 loves to test the gap: a question may show a drug with strong explanatory RCT data underperforming in your clinic — the answer is lack of external validity / effectiveness gap, not "the drug doesn't work."

Board pearl: Run-in periods are an explanatory hallmark — they boost internal validity but inflate apparent benefit by enriching for adherent responders, harming generalizability.

Stem cues that scream "explanatory":

Stem cues that scream "pragmatic":

Hybrid trials: Many modern trials sit mid-spectrum. SPRINT had strict BP measurement protocol (explanatory feature) but broad cardiovascular population (pragmatic feature). Don't force binary classification.

History-taking analogy: Just as you elicit chief complaint plus context, you read a trial's "chief complaint" (PICO) plus its context of conduct.

Physical Exam Findings — Structural Features That Define Each Design

— Explanatory: narrow, many exclusions (age caps, comorbidity exclusions, polypharmacy exclusions, pregnancy, renal/hepatic disease).

— Pragmatic: broad, mirrors the population to whom the intervention will ultimately be applied. Often only safety-based exclusions.

— Explanatory: specialized research centers, dedicated study coordinators, protocolized monitoring.

— Pragmatic: routine clinical settings (community clinics, general hospitals), care delivered by usual providers.

— Explanatory: rigid dose, schedule, monitoring; deviations are protocol violations.

— Pragmatic: clinicians titrate per usual practice; the strategy is randomized, not a fixed dose.

— Explanatory: pill counts, drug levels, directly observed therapy.

— Pragmatic: no active enforcement; adherence reflects real-world behavior.

— Explanatory: frequent protocol-driven visits, extensive data capture.

— Pragmatic: follow-up matches usual care; outcomes captured passively (claims, EHR, vital statistics).

— Explanatory: placebo to isolate the biologic effect.

— Pragmatic: active usual care to inform the actual clinical choice.

— Explanatory: surrogate or mechanistic (biomarker change).

— Pragmatic: patient-important, hard endpoints (all-cause mortality, hospitalization, function, quality of life).

— Explanatory: per-protocol acceptable; CACE / instrumental variable methods.

— Pragmatic: intention-to-treat is non-negotiable — it preserves the real-world question.

Step 3 management: When a guideline cites a pragmatic trial for a recommendation, you can apply it confidently to your general panel. When it cites an explanatory trial, ask whether your patient resembles the enrolled population before generalizing.

Eligibility criteria (the "vitals" of trial design):

Setting:

Intervention delivery (flexibility):

Adherence monitoring:

Follow-up intensity:

Comparator:

Primary outcome:

Analysis:

Diagnostic Workup — Initial Appraisal Tools

— Eligibility, Recruitment, Setting, Organization, Flexibility (delivery), Flexibility (adherence), Follow-up, Primary outcome, Primary analysis.

— Wider wheel = more pragmatic. Narrower wheel = more explanatory.

— You won't be asked to score it on Step 3, but you should recognize the domains as the axes of design.

— 1. Who was enrolled? Broad (pragmatic) vs narrow (explanatory).

— 2. What was the comparator? Usual care (pragmatic) vs placebo (explanatory).

— 3. How was the intervention delivered? Flexible (pragmatic) vs rigid (explanatory).

— 4. What was measured? Patient-important (pragmatic) vs surrogate (explanatory).

— 5. How was it analyzed? ITT (pragmatic) vs per-protocol (explanatory).

— Internal validity = does the trial accurately measure what it claims for the enrolled population? (Favors explanatory design.)

— External validity / generalizability = do results apply to patients outside the trial? (Favors pragmatic design.)

— These exist in tension. Tightening one often loosens the other.

Board pearl: A trial with high internal validity but low external validity is the textbook explanatory trial — its result is "true" but may not apply to your 78-year-old with CKD and three other meds, who would have been excluded at screening.

Key distinction: Internal validity asks "is the answer correct?" External validity asks "is the answer relevant to my patient?" Step 3 questions about applying RCT results turn on the second.

PRECIS-2 wheel (Loudon 2015): The dominant framework. Nine spokes, each scored 1–5:

Quick bedside appraisal — five questions to ask:

Validity check:

CONSORT extension for pragmatic trials: A reporting standard requiring authors to describe how the intervention is implemented in real-world settings, the comparator, and applicability.

Diagnostic Workup — Advanced Considerations and Hybrid Designs

— Randomization unit is a clinic, hospital, or geographic region, not the individual patient.

— Useful when the intervention is systems-level (a care pathway, EHR alert, screening program).

— Risk: contamination between arms if clusters share providers; ICC (intraclass correlation) must be accounted for in sample size — patients within a cluster are correlated, reducing effective N.

— All clusters eventually receive the intervention, but rollout is staggered in random order.

— Pragmatic and ethically attractive when the intervention is believed beneficial and withholding it long-term is problematic.

— Common in QI, public health implementation, infection control bundles.

— Pre-specified rules allow modification (dose, arm dropping, sample size) based on accumulating data.

— Can be pragmatic or explanatory; platform trials like RECOVERY (COVID-19) are large, pragmatic, adaptive — broad eligibility, usual-care comparator, hard endpoints (28-day mortality), minimal data collection.

— Use existing clinical registries for enrollment, randomization, and outcome ascertainment.

— Highly pragmatic; low cost; large N. Example: TASTE trial (thrombus aspiration in STEMI) used the SCAAR registry.

— Type 1: primarily effectiveness, secondary implementation outcomes.

— Type 2: dual focus.

— Type 3: primarily implementation, secondary effectiveness. Increasingly common in health services research.

CCS pearl: When a Step 3 systems/QI question describes a hospital rolling out a sepsis bundle across units in random order over 6 months, recognize this as a stepped-wedge cluster pragmatic design — and recognize that ITT analysis at the cluster level is the rigorous approach.

Cluster-randomized trials (often pragmatic):

Stepped-wedge cluster trials:

Adaptive trial designs:

Registry-based RCTs (R-RCT):

Hybrid effectiveness-implementation trials (Curran framework):

Risk Stratification — Choosing the Right Design for the Right Question

— Early-phase development; you need to know whether the intervention has any biological effect.

— Mechanism-of-action studies; dose-finding; FDA registration trials.

— Rare diseases where homogeneity helps detect signal.

— When safety profile is uncertain and tight monitoring is ethically required.

— Intervention already shown efficacious; the question is real-world value.

— Comparative effectiveness research — choosing between two approved options.

— Health policy, implementation, and value-based care decisions.

— When the population for whom the decision matters is heterogeneous (older adults, multimorbidity).

— Phase II (explanatory) → Phase III (mixed) → Phase IV / post-marketing (often pragmatic) → comparative effectiveness (pragmatic).

— Recognize that guidelines may evolve as a treatment moves from explanatory to pragmatic evidence base.

— Explanatory trials often industry-sponsored (regulatory approval goal).

— Pragmatic trials often publicly funded (NIH, PCORI, NIHR) because they answer questions the market won't.

Board pearl: PCORI-funded trial in a question stem ≈ pragmatic, patient-important outcomes, real-world populations. This is a high-yield signal.

Key distinction: Drug approval requires efficacy (explanatory). Guideline placement and value-based reimbursement increasingly require effectiveness (pragmatic). A drug can clear the first hurdle and still fail the second — which is why pragmatic trials matter for the practicing internist.

When to favor an explanatory design:

When to favor a pragmatic design:

Translational pipeline:

Funding and sponsorship cues:

PCORI (Patient-Centered Outcomes Research Institute): US funder explicitly tasked with pragmatic, patient-centered comparative effectiveness research — created by the ACA.

Pharmacotherapy Analogy — Translating Trial Type to Prescribing Decisions

— Best for answering: "Does drug X lower LDL more than placebo in adults with hyperlipidemia?"

— Limitation: tells you little about whether your 82-year-old with polypharmacy, mild dementia, and CKD will benefit or tolerate it.

— Number needed to treat (NNT) from an explanatory trial often overestimates real-world benefit because adherence and event rates differ.

— Best for answering: "If I prescribe drug X versus drug Y in my usual panel, which leads to fewer hospitalizations?"

— Includes the noise of real life: missed doses, switched therapies, side-effect dropouts. The estimate reflects what you will actually see.

— Effect sizes are typically smaller than in explanatory trials — not because the drug is worse, but because non-adherence and crossover dilute the signal under ITT.

— ADAPTABLE (2021): 81 mg vs 325 mg aspirin for secondary CV prevention; registry-based, broad eligibility, patient-reported outcomes. Result: no difference, but >40% crossover — a pragmatic interpretation lesson.

— Salford Lung Study: fluticasone furoate/vilanterol vs usual care in real-world COPD/asthma populations. Enrolled patients normally excluded from RCTs (smokers, comorbidities).

— ASCEND, ASPREE: aspirin in primary prevention — broad populations, hard endpoints.

Step 3 management: When choosing between two equally efficacious drugs, pragmatic comparative effectiveness data (real-world adherence, tolerability, hospitalization) should drive selection — not relative risk reductions from highly selected explanatory trials. ALLHAT's elevation of thiazides as first-line antihypertensives is the canonical example.

Applying explanatory trial data:

Applying pragmatic trial data:

Classic pragmatic pharmacology trials:

CATIE (schizophrenia), ALLHAT (hypertension), STAR*D (depression): Landmark practical/pragmatic trials that reshaped guidelines because they tested real-world strategies, not just drugs.

Procedural and Strategy Trials — Expanded Design Pharmacology

— Randomize strategies (e.g., "early invasive vs ischemia-guided" in NSTEMI; "rate vs rhythm control" in AF) rather than fixed doses or devices.

— Allow clinician judgment within each arm — mirrors practice.

— Examples: AFFIRM (rate vs rhythm in AF), COURAGE (PCI + OMT vs OMT alone in stable CAD), ISCHEMIA, MASTER DAPT.

— Many pragmatic trials are non-inferiority designs comparing a new strategy to an established one.

— Watch the non-inferiority margin — must be clinically justified and pre-specified.

— ITT can bias non-inferiority toward "no difference" because of crossover and dilution; therefore non-inferiority trials often report both ITT and per-protocol — agreement strengthens the conclusion.

— Often partially blinded (operator cannot be blinded; outcome assessors should be).

— Sham-controlled procedural trials (e.g., renal denervation SYMPLICITY HTN-3, vertebroplasty) are explanatory in flavor — they isolate the procedural effect from placebo response.

— Pragmatic procedural trials (e.g., TASTE, registry-based) test the procedure in unselected real-world patients.

Board pearl: A trial randomizing patients to "early invasive vs conservative" with operators making individualized device/drug choices within each arm is a pragmatic strategy trial. Its result tells you which approach to choose at the system level — not which stent or drug to pick.

Key distinction: Sham-controlled procedural trial = explanatory (isolates biologic effect). Registry-based procedural trial = pragmatic (estimates real-world benefit). Both are valid; they answer different questions.

Strategy trials are inherently pragmatic:

Non-inferiority and pragmatic design:

Device and procedural trials:

Cluster-randomized implementation trials of procedures: randomize hospitals to a pathway (e.g., regional STEMI network) rather than patients to a device.

Special Populations — Elderly, Renal/Hepatic Impairment, and the Generalizability Gap

— Explanatory trials routinely exclude patients >75, eGFR <60, significant hepatic disease, dementia, life expectancy <1 year, and polypharmacy.

— Yet these are exactly the patients you treat in ambulatory internal medicine and geriatrics.

— Result: a generalizability gap — guidelines built on explanatory evidence may not apply to your panel.

— Statins for primary prevention >75: most large explanatory trials underenrolled this group; guidance is extrapolated. Pragmatic trials (e.g., STAREE, PREVENTABLE) are addressing this.

— Anticoagulation in CKD stage 4–5 AF: largely excluded from DOAC trials; real-world registry data fills the gap.

— Cancer therapeutics: phase III trial median age often a decade younger than real-world patients.

— Broader eligibility means more representative effect estimates for older, multimorbid patients.

— Subgroup analyses in pragmatic trials are more interpretable because the subgroups actually exist in adequate numbers.

— Explanatory pharmacokinetic trials in dedicated impairment cohorts answer dosing questions.

— Pragmatic trials answer the harder question: does dose-adjusted therapy actually improve outcomes in these patients?

Step 3 management: When applying an explanatory RCT result to an elderly, multimorbid, or renally impaired patient who would have been excluded from enrollment, document the generalizability limitation and individualize using shared decision-making. Look for pragmatic or registry-based data in this subgroup before assuming benefit transfers.

Board pearl: The phrase "would not have qualified for the landmark trial" is a Step 3 cue to temper enthusiasm for the intervention and discuss uncertainty with the patient — not to withhold treatment reflexively.

The exclusion problem:

Examples:

Pragmatic trials and elderly inclusion:

Renal/hepatic impairment:

Special Populations — Pregnancy, Pediatrics, and Vulnerable Groups

— Routinely excluded from both explanatory and pragmatic trials due to ethical and liability concerns.

— Evidence base is therefore largely observational, registry-based, or post-marketing — closer to pragmatic in spirit but without randomization.

— Pregnancy exposure registries (e.g., antiepileptic, antidepressant) are key effectiveness/safety data sources.

— Movement toward pragmatic inclusion of pregnant patients (NIH Task Force on Research Specific to Pregnant Women and Lactating Women, PRGLAC) is reshaping the field.

— Explanatory pediatric trials are rare and small; extrapolation from adult data is common but risky.

— Pragmatic pediatric networks (e.g., PEDSnet, PCORnet) enable registry-based comparative effectiveness.

— FDA Pediatric Research Equity Act (PREA) and Best Pharmaceuticals for Children Act (BPCA) mandate pediatric studies for new drugs.

— Explanatory trials historically underenroll racial and ethnic minorities and low-SES populations.

— Pragmatic trials conducted in community and safety-net settings improve representation — but only if active recruitment strategies are used.

— Underrepresentation drives health equity concerns when guidelines built on non-representative data are applied universally.

— Pragmatic trials embedded in community health centers or telehealth networks are increasingly used to test interventions in resource-limited contexts.

Key distinction: Underenrollment ≠ ineffectiveness in an unstudied subgroup, but it does mean uncertainty. Step 3 may ask you to identify a limitation in applying a trial result to a Hispanic or Black patient when the trial was 92% non-Hispanic White.

Board pearl: Diversity in trial enrollment is a patient safety and equity issue. Recognizing it as a limitation — and citing pragmatic/registry evidence to fill the gap — is the Step 3-correct response.

Pregnancy:

Pediatrics:

Racial, ethnic, and socioeconomic representation:

Rural and low-resource settings:

Complications and Adverse Outcomes — Limitations of Each Design

— Low external validity: results may not generalize beyond the selected population.

— Surrogate endpoint overreliance: biomarker improvement may not translate to patient benefit (classic example: CAST trial — flecainide suppressed PVCs post-MI but increased mortality).

— Run-in periods inflate apparent benefit by excluding non-responders and non-adherent patients before randomization.

— Hawthorne effect amplified by intense protocol contact — adherence and outcomes better than real-world.

— Smaller sample sizes may miss rare adverse events.

— Lower internal validity: crossover, non-adherence, contamination dilute estimates.

— ITT bias toward null — true benefit may be underestimated; problematic in non-inferiority interpretation.

— Outcome ascertainment heterogeneity: EHR/claims data are subject to coding errors and missingness.

— Blinding often impossible — open-label designs introduce performance and detection bias unless PROBE (blinded endpoint assessment) is used.

— Confounding by indication if clinicians within arms tailor therapy based on patient characteristics.

— Type I error (false positive) and Type II error (false negative) risks.

— Loss to follow-up — pragmatic trials mitigate via passive registry follow-up; explanatory via intensive contact.

— Publication bias — both can be affected.

Board pearl: The CAST trial is the canonical Step 3 lesson that surrogate endpoints can mislead. An explanatory trial showing improved arrhythmia suppression with flecainide led to widespread use; a subsequent trial measuring mortality (a patient-important pragmatic endpoint) showed harm. Always ask what was measured.

Key distinction: Explanatory trials risk being right about the wrong question. Pragmatic trials risk being imprecise about the right question. Triangulating both is the goal.

Limitations of explanatory trials:

Limitations of pragmatic trials:

Common to both:

When to Escalate — Choosing Between, Combining, or Triangulating Designs

— Mechanism (preclinical) → efficacy (explanatory phase II/III) → effectiveness (pragmatic phase IV / comparative effectiveness) → implementation (hybrid type 3 / QI).

— A mature evidence base spans the spectrum.

— When the pragmatic trial is larger, longer, and uses patient-important endpoints, and the explanatory trial used surrogates.

— When real-world adherence patterns make the explanatory effect size unachievable in practice.

— Example: ADAPTABLE taught that strict aspirin dose distinctions matter less than ensuring patients take aspirin.

— Early in development, when biologic effect must be confirmed.

— When the pragmatic trial is underpowered or has high crossover obscuring signal.

— For mechanistic claims (e.g., does drug X actually inhibit pathway Y?).

— Use both designs to build a complete picture. Guidelines increasingly cite networks of evidence (RCT meta-analyses + real-world evidence + registries).

— Real-world evidence (RWE) is now FDA-recognized for some label expansions (21st Century Cures Act).

— When the question is "should our health system adopt this pathway?" → cluster-randomized or stepped-wedge pragmatic trial is the answer.

— When the question is "should this drug be approved?" → explanatory phase III.

Step 3 management: If a guideline recommendation is based solely on surrogate-endpoint explanatory trials and your patient has competing risks or limited life expectancy, shared decision-making with explicit uncertainty disclosure is the Step 3-correct approach. Don't reflexively apply; don't reflexively withhold.

CCS pearl: Hospital-level QI interventions (sepsis bundles, readmission programs) are best evaluated with pragmatic, cluster-randomized or stepped-wedge designs — recognize this when CCS-style systems questions appear.

Sequential strategy (the translational pipeline):

When pragmatic trial evidence should override explanatory:

When explanatory evidence remains primary:

Triangulation:

Escalating to systems-level evaluation:

Key Differentials — Other Trial Designs Within the Experimental Category

— Superiority: is A better than B? Default design.

— Non-inferiority: is A not unacceptably worse than B? Common in pragmatic comparative effectiveness when B is established and A has other advantages (cost, tolerability).

— Equivalence: is A within a margin of B in both directions? Rare clinically; more common for bioequivalence.

— Parallel is standard.

— Crossover (each patient receives both interventions sequentially) increases power but requires stable chronic conditions and adequate washout periods. More explanatory in feel.

— Randomize to two interventions simultaneously (e.g., 2x2): tests both and their interaction. Efficient; can be pragmatic (e.g., ISIS-2 aspirin + streptokinase) or explanatory.

— Pre-specified modifications based on interim data. RECOVERY (COVID-19) — pragmatic platform that rapidly identified dexamethasone benefit and hydroxychloroquine harm.

— Single-patient crossover with multiple treatment periods. Explanatory in design but pragmatic in goal (personalize for this patient). Useful in chronic stable conditions (chronic pain, ADHD).

Key distinction: Non-inferiority designs paired with ITT analysis are a tricky combination — ITT biases toward no difference, which falsely favors a conclusion of non-inferiority. Step 3 may ask you to identify this as a methodological concern. The fix: report both ITT and per-protocol; require agreement.

Board pearl: Platform trials (RECOVERY, REMAP-CAP, I-SPY) are pragmatic, adaptive, and shared-control designs that accelerate evidence generation in pandemics and oncology. Recognize the name pattern.

Superiority vs non-inferiority vs equivalence:

Parallel-group vs crossover:

Factorial design:

Adaptive and platform trials:

N-of-1 trials:

Cluster RCTs: See chunk 5 — usually pragmatic.

Key Differentials — Observational Designs and Real-World Evidence

— Prospective or retrospective; measure exposure → outcome.

— Subject to confounding (treated patients differ from untreated).

— Methods to mitigate: propensity score matching, instrumental variables, target trial emulation.

— Use observational data to emulate the design of a hypothetical pragmatic trial. Pre-specify eligibility, treatment strategies, follow-up, outcomes, and analysis as if running an RCT. Reduces bias substantially when RCT is infeasible.

— Differ from registry-based RCTs by lacking randomization. Strong for safety surveillance, weaker for causal effectiveness claims.

— Large N, real-world, but subject to coding misclassification, missing data, immortal time bias, and healthy user bias.

— When RCT is unethical (smoking → lung cancer), infeasible (rare events), or already done and consistent.

— Hill criteria (strength, consistency, temporality, biological gradient, plausibility, coherence) help establish causation from observational data.

— Systematic review of RCTs > single RCT (pragmatic or explanatory) > target-trial-emulated cohort > standard cohort > case-control > case series > expert opinion.

Step 3 management: When asked whether to act on observational data alone, consider: effect size, consistency across studies, dose-response, biological plausibility, and absence of better evidence. Strong, consistent, large-effect observational data (e.g., smoking, asbestos) can guide practice; weak signals require RCT confirmation.

Key distinction: A pragmatic RCT preserves randomization (eliminates confounding) while approximating real-world conditions — it is the gold standard for comparative effectiveness. A registry cohort mimics conditions but cannot eliminate confounding by indication.

Why this matters: Pragmatic trials and high-quality observational studies often answer similar real-world questions. Knowing the difference is high-yield.

Cohort studies:

Target trial emulation:

Registry-based studies:

Claims-based and EHR-based studies:

When observational evidence is sufficient:

Hierarchy of evidence (modernized):

Secondary Prevention — Applying Trial Type to Long-Term Guideline Decisions

— Use pragmatic trial data to anchor decisions about chronic disease management strategies (BP targets, glycemic targets, anticoagulation duration, secondary prevention regimens).

— Use explanatory trial data to confirm a drug can work and to set dose and mechanism expectations.

— Guidelines grade evidence (e.g., ACC/AHA Class I-III, Level A-C; USPSTF A-D, I). The grade reflects strength and consistency, not pragmatism.

— A Class I, Level A recommendation may rest on explanatory trials; check whether effectiveness data exist in your patient's subgroup.

— ALLHAT → thiazide-based therapy preferred first-line in uncomplicated HTN (pragmatic outcomes: CV events, all-cause).

— CATIE → atypical antipsychotics not uniformly superior to perphenazine for chronic schizophrenia; tolerability/discontinuation as patient-important endpoint.

— STAR*D → sequential antidepressant strategies; real-world remission rates lower than explanatory trials suggested.

— Pragmatic trial endpoints (hospitalization, mortality, function, QoL) align with what you actually measure at follow-up visits.

— Guideline-driven monitoring intervals (e.g., HbA1c every 3–6 months, lipid panel annually) reflect pragmatic considerations.

— Increasingly guided by pragmatic trials in older adults showing that intensive control of surrogates may not improve outcomes and may cause harm (e.g., ACCORD in T2DM — intensive glycemic control increased mortality).

Board pearl: ACCORD is the cautionary tale that tighter surrogate control ≠ better outcomes. A pragmatic, patient-important endpoint (mortality) reversed a guideline trajectory built on surrogate (HbA1c) evidence. Step 3 loves this archetype.

Step 3 management: For long-term ambulatory management, prefer effectiveness data when available; document shared decision-making when applying efficacy data to patients outside the trial population.

Translating trial type into your long-term plan:

Guideline interpretation:

Comparative effectiveness in chronic disease:

Long-term monitoring:

Deprescribing:

Follow-Up, Monitoring, and Counseling — Communicating Evidence to Patients

— When recommending a therapy based on explanatory trial data, explicitly note: "This was studied in patients like you / not exactly like you" — calibrate confidence.

— Discuss absolute risk reduction (ARR) and NNT from the most applicable trial; relative risk reduction alone is misleading.

— Acknowledge uncertainty — a Step 3 communication-quality marker.

— Required for preference-sensitive decisions (PSA screening, lung cancer screening, anticoagulation in borderline AF risk).

— Use decision aids that present pragmatic outcome data (events per 100 patients over 10 years) rather than relative measures.

— Choose monitoring intervals consistent with what pragmatic effectiveness data support, not arbitrary protocols. Over-monitoring increases cost and harm without improving outcomes.

— Example: routine post-MI stress testing without symptoms is not supported by pragmatic outcome data and is a Choosing Wisely target.

— Increasingly captured in pragmatic trials and in routine care via portals.

— PROMIS, EQ-5D, condition-specific measures — high-yield in cancer, HF, mental health.

— Translate trial findings into plain-language risk communication ("for every 100 people like you treated for 5 years, about 3 will avoid a heart attack").

Step 3 management: When a patient asks "does this drug really work?", the Step 3-correct answer integrates efficacy (does it work biologically) + effectiveness (does it work for people like me in real life) plus shared decision-making about preferences. This is the modern ambulatory counseling standard.

Board pearl: Decision aids are evidence-based tools that improve knowledge, reduce decisional conflict, and align decisions with values — a Step 3 favored intervention for preference-sensitive care.

Counseling patients on trial evidence:

Shared decision-making (SDM):

Monitoring parameters:

Patient-reported outcomes (PROs):

Health literacy:

Ethical, Legal, and Patient Safety Considerations

— Traditional written consent can introduce selection bias (only enthusiastic patients enroll), undermining pragmatism.

— Modified consent models — broadcast notification, opt-out, integrated consent, waiver of consent for minimal-risk comparative effectiveness research (CER) — are increasingly used under Common Rule flexibility.

— IRBs must balance respect for persons with research feasibility and clinical equipoise.

— Genuine uncertainty about which intervention is superior — required for ethical randomization.

— Pragmatic comparative effectiveness trials often have strong equipoise because both arms are accepted standards.

— Pragmatic trial of oxygen saturation targets in preterm infants raised questions about consent disclosure when both arms were within standard of care. Sparked guidance on how to disclose risk in pragmatic CER.

— Patients in a cluster may not individually consent to the cluster's assignment. Gatekeeper consent (institutional decision) plus individual consent for data use is common.

— Pragmatic trial protocols that mirror usual care preserve continuity, reducing handoff errors. Explanatory protocols with study-specific medications and visits increase transition risk when patients return to community care post-trial.

— Both trial types require adverse event reporting to IRBs, FDA (IND safety reports), and DSMBs.

— Pragmatic trials embedded in EHRs enable continuous safety monitoring via clinical data.

— Underenrollment of vulnerable groups in explanatory trials raises distributive justice concerns when those groups bear disease burden but don't inform evidence.

Step 3 patient safety pearl: When a clinician enrolls a patient in a comparative effectiveness study where both arms are within standard of care, the minimum ethical requirement is transparent disclosure that the patient's treatment is being randomized — even when a waiver of formal written consent has been granted. Deception about randomization is never acceptable.

Informed consent in pragmatic trials:

Equipoise:

The OHRP/SUPPORT controversy:

Cluster randomization ethics:

Transition-of-care risk:

Mandatory reporting and safety surveillance:

Equity and justice:

High-Yield Associations and Rapid-Fire Clinical Facts

Board pearl: When a question stem names ALLHAT, CATIE, STAR*D, ACCORD, ADAPTABLE, RECOVERY, ISCHEMIA, or AFFIRM, you are in pragmatic / strategy trial territory — interpret results as comparative effectiveness for real-world decisions, not pure efficacy.

PRECIS-2 = pragmatic-explanatory continuum tool, 9 domains, scored 1–5.

Efficacy = explanatory, ideal conditions, "can it work?"

Effectiveness = pragmatic, real-world, "does it work?"

Run-in period → explanatory hallmark; inflates apparent benefit.

Usual-care comparator → pragmatic hallmark.

PROBE design = Prospective Randomized Open Blinded Endpoint — pragmatic compromise when blinding is infeasible.

Cluster RCT → systems-level pragmatic; account for ICC in sample size.

Stepped-wedge → all clusters eventually treated; ethical for likely beneficial interventions.

Registry-based RCT → TASTE, SWEDEHEART platform — pragmatic, low-cost.

Platform trial → RECOVERY, REMAP-CAP, I-SPY — adaptive, shared control, pragmatic.

ITT → preserves randomization, biases non-inferiority toward null.

Per-protocol → analyzes adherent patients only; biases toward observed effect.

CAST → surrogate (PVC suppression) failed mortality endpoint — explanatory trap.

ACCORD → intensive glycemic control increased mortality — pragmatic correction.

ALLHAT → thiazides first-line — pragmatic hypertension landmark.

CATIE / STAR*D → real-world psychiatry effectiveness.

ADAPTABLE → pragmatic aspirin dose comparison; high crossover.

Salford Lung Study → real-world COPD/asthma trial including normally excluded patients.

PCORI → US funder of patient-centered pragmatic comparative effectiveness research.

21st Century Cures Act → enables FDA use of real-world evidence.

Target trial emulation → observational data analyzed as if it were a pragmatic RCT.

External validity = generalizability = pragmatic strength.

Internal validity = freedom from bias within sample = explanatory strength.

Hawthorne effect more pronounced in explanatory protocols.

Choosing Wisely = pragmatic-evidence-driven deimplementation.

Board Question Stem Patterns

— Stem: A drug with strong RCT evidence is underperforming in a community practice.

— Answer: Lack of external validity / effectiveness gap — original trial was explanatory; real-world adherence, comorbidities, and population differ. Not "the drug doesn't work."

— Stem describes broad eligibility, usual-care comparator, EHR-based outcomes, ITT analysis.

— Answer: Pragmatic trial. High generalizability, lower internal validity.

— Stem describes a tightly controlled trial with surrogate endpoint and narrow eligibility.

— Answer: Limited generalizability / surrogate endpoint may not reflect patient-important outcome (CAST archetype).

— Question: "Should our hospital adopt sepsis bundle X across all units?"

— Answer: Cluster-randomized or stepped-wedge pragmatic trial.

— Question: "Does drug X biologically lower LDL?"

— Answer: Explanatory placebo-controlled RCT.

— Stem: ITT analysis shows non-inferiority; per-protocol does not.

— Answer: ITT may falsely favor non-inferiority due to dilution from crossover; concordance with per-protocol strengthens conclusion — disagreement weakens it.

— Stem: Pragmatic CER trial uses opt-out / modified consent.

— Answer: Acceptable when both arms are standard of care, risk is minimal, and waiver criteria are met under Common Rule, with IRB oversight and patient notification.

— Stem: Landmark trial excluded patients >75; your patient is 82 with CKD.

— Answer: Discuss uncertainty, individualize via shared decision-making, seek pragmatic or registry data for the subgroup.

Step 3 management: Most Step 3 design questions reward the measured, real-world clinician answer — apply evidence with awareness of its limits, integrate patient preferences, and avoid both nihilism and overreach.

Pattern 1 — "Why didn't it work in my clinic?"

Pattern 2 — "Identify the design."

Pattern 3 — "What's the limitation?"

Pattern 4 — "Which design best answers this question?"

Pattern 5 — "Interpret the non-inferiority result."

Pattern 6 — "Ethics of consent."

Pattern 7 — "Applying to elderly / multimorbid patient."

One-Line Recap

Pragmatic trials answer "does it work in my patient under real conditions?" while explanatory trials answer "can it work under ideal conditions?" — and Step 3 expects you to choose, interpret, and apply each appropriately.

Board pearl: The single most testable concept is the efficacy–effectiveness gap — and recognizing that pragmatic trials, comparative effectiveness research, and real-world evidence are the tools that close it for the practicing physician.

Recap 1 — Spectrum, not binary: Use PRECIS-2 domains (eligibility, setting, flexibility, follow-up, outcome, analysis) to place a trial on the continuum. Most modern trials are hybrid; identify which features lean pragmatic vs explanatory rather than forcing a label.

Recap 2 — Validity trade-off: Explanatory = high internal validity, narrow external validity (great for proving biology, weaker for generalizing to your panel). Pragmatic = high external validity, lower internal validity (great for real-world decisions, more vulnerable to crossover and confounding). Triangulate both for full understanding.

Recap 3 — Endpoint and analysis matter most: Pragmatic = patient-important hard outcomes (mortality, hospitalization, QoL) + ITT analysis. Explanatory = surrogates (biomarkers) + per-protocol acceptable. CAST and ACCORD are permanent reminders that surrogate gains can mask patient-important harm.

Recap 4 — Apply with humility: When the patient in front of you would have been excluded from the landmark trial — older, multimorbid, pregnant, minority, low eGFR — name the generalizability gap, seek pragmatic or registry evidence, and engage shared decision-making. This is the Step 3 ambulatory-internist voice the exam rewards.