Biostatistics & Population Health
Statistical significance vs clinical significance
— A study can be statistically significant but clinically trivial (e.g., a drug that lowers SBP by 1.2 mmHg, p<0.001 in 40,000 patients).
— Conversely, a study can show a clinically meaningful effect that fails statistical significance because of small sample size or wide confidence intervals.
— Interpreting RCT abstracts in journal-club-style vignettes
— Deciding whether to adopt a new therapy in your clinic
— Counseling patients on absolute vs relative risk reductions
— Evaluating screening tests, quality improvement data, and pharmaceutical marketing claims
— Sample size: very large N can render trivial differences "significant"
— Effect size: the absolute magnitude (mean difference, ARR, NNT) is what matters clinically
— Patient-oriented vs disease-oriented outcomes: A1c reduction ≠ reduced amputations; LDL reduction ≠ reduced MI in every population
— Minimal clinically important difference (MCID): the threshold below which patients can't detect or don't care about the change
— Huge sample, tiny effect, p<0.001
— Surrogate endpoint only (bone density, not fractures)
— Composite endpoints driven by soft components (hospitalization rather than mortality)
— NNT in the hundreds or thousands for a non-fatal outcome
Board pearl: On Step 3, when a question says "statistically significant" but the absolute effect is tiny or the NNT is huge, the correct answer is usually do not change practice or counsel the patient that the benefit is small. Statistical significance alone never justifies a clinical decision.

— A drug rep, colleague, or patient brings a new study claiming benefit
— You are asked to interpret a forest plot, abstract, or summary table
— A quality improvement initiative shows a "significant" change in a process measure
— A patient asks: "Should I take this new medication I saw on TV?"
— Sets up a trial: N, intervention, comparator, primary outcome
— Reports relative risk reduction (RRR) prominently, absolute risk reduction (ARR) buried or omitted
— Provides p-value and/or 95% CI
— Asks: "What is the most appropriate response/recommendation?"
— Outcome is a surrogate marker (LDL, A1c, blood pressure, tumor size, ejection fraction) rather than a patient-oriented outcome (POEM: mortality, morbidity, quality of life)
— Population studied differs from your patient (younger, healthier, different comorbidities — external validity issue)
— Follow-up duration too short to capture the outcome that matters
— Industry funding with selective reporting
— Large effect on hard endpoint (mortality, stroke, MI)
— Consistent direction across subgroups and prior trials
— Effect exceeds the MCID for the disease (e.g., ≥0.5 point change on a validated dyspnea scale)
— Plausible mechanism and dose-response relationship
Key distinction: Statistical significance is a property of the data and sample size; clinical significance is a property of the patient and the decision. The same p-value can mean "practice-changing" in one trial and "ignore" in another, depending on effect size, outcome type, and population.
Step 3 management: When you see a study with p<0.05, immediately ask three things: (1) What's the ARR and NNT? (2) Is the outcome patient-oriented or surrogate? (3) Does this apply to my patient sitting in front of me? Only then decide.

— Is the primary outcome clearly defined and prespecified?
— Are results given in relative terms only (RRR, HR) without absolute numbers? That is a red flag for inflated perceived benefit.
— Is the p-value the only effect measure shown? Demand the 95% CI and point estimate.
— Compute or look for ARR = control event rate − treatment event rate
— NNT = 1/ARR; NNT <50 for a meaningful outcome is generally clinically attractive; NNT >100 for soft outcomes is often not
— NNH (number needed to harm) for adverse effects — compare against NNT
— Wide 95% CI crossing or hugging the null suggests imprecision even when p<0.05 in a one-sided sense
— A CI like 0.95–0.99 for HR is "significant" but barely
— Multiple comparisons without adjustment inflate false-positive findings
— Baseline risk drives ARR. High-risk patients gain more absolute benefit from any given RRR than low-risk patients.
— Example: 25% RRR in mortality means ARR of 5% in high-risk (NNT 20) but 0.25% in low-risk (NNT 400) populations.
— N, follow-up duration, loss to follow-up (>20% threatens validity), adherence rates, intention-to-treat vs per-protocol analysis
Board pearl: Relative risk reduction is constant across risk strata; absolute risk reduction is not. The same drug with "30% RRR" can be lifesaving in secondary prevention and useless in primary prevention of low-risk patients. Always anchor the decision to baseline risk × RRR = ARR.

— Absolute Risk Reduction (ARR): difference in event rates; the most clinically actionable single number
— Relative Risk (RR) and Relative Risk Reduction (RRR = 1−RR): proportional change; can mislead if baseline is low
— Odds Ratio (OR): approximates RR when outcome is rare (<10%); diverges when outcome is common
— Hazard Ratio (HR): time-to-event; assumes proportional hazards
— Number Needed to Treat (NNT) = 1/ARR
— Mean difference / standardized mean difference (Cohen's d): for continuous outcomes; d=0.2 small, 0.5 medium, 0.8 large
— 95% Confidence Interval: the range of plausible true effects; if it excludes the null (1 for ratios, 0 for differences), result is statistically significant
— Narrow CI → precise estimate; wide CI → imprecise even if "significant"
— p<0.05 is a convention, not a biological truth
— p does not measure effect size or clinical importance
— A p of 0.04 and 0.06 are not meaningfully different — both reflect comparable uncertainty
— POEM (Patient-Oriented Evidence that Matters): mortality, symptoms, quality of life, function
— DOE (Disease-Oriented Evidence): biomarkers, imaging findings, lab values
— POEM > DOE for clinical decisions
Step 3 management: Before recommending any therapy from a new trial, document mentally: ARR, NNT, 95% CI, outcome type (POEM vs DOE), and applicability to your patient's baseline risk. If you cannot fill in those five, do not change management. This is the EBM equivalent of "do not act on a single troponin without trending it."

— The smallest change in an outcome that patients perceive as beneficial
— Examples: ~0.5 point on Borg dyspnea scale; ~10–12 points on SF-36; ~2 points on UPDRS motor; ~1.0–1.5 mm on visual analog pain scale per 10 cm
— If a trial's mean improvement is below the MCID, the result may be statistically significant but clinically meaningless
— Number of patients who would need to switch from "event" to "no event" to flip statistical significance
— Many landmark trials have fragility indices of 1–10; small index → fragile result
— Generally hypothesis-generating only unless prespecified and adjusted for multiple comparisons
— A "significant" subgroup finding from a negative overall trial is usually noise
— Inspect each component separately
— If the composite is "MACE" but the only positive component is unplanned revascularization, mortality benefit is not demonstrated
— Require a prespecified margin; if CI for the difference lies entirely within the margin, non-inferiority is met
— Statistical non-inferiority can still hide clinically meaningful inferiority if the margin was set too generously
— Even statistically robust associations in observational studies (HR 1.5, p<0.001) may reflect confounding, not causation
Key distinction: Statistical significance answers "is this real?"; MCID answers "is this big enough?"; fragility index answers "is this robust?". All three must be favorable before a single trial changes your practice. A trial with p=0.04, fragility index of 2, and effect size below the MCID is not practice-changing — it is hypothesis-generating.

— Use validated calculators: ASCVD pooled cohort, CHA₂DS₂-VASc, HAS-BLED, FRAX, Wells, CURB-65
— Higher baseline risk → larger ARR from the same RRR → lower NNT → more compelling to treat
— Example: ASCVD 10-yr risk = 15%; statin RRR ≈ 25% → ARR ≈ 3.75% → NNT ≈ 27 over 10 years
— Same RRR in patient with 5% baseline risk → ARR 1.25%, NNT 80
— Statin NNH for new diabetes ≈ 100–200; NNT for MI ≈ 20–80 in secondary prevention → favorable
— In low-risk primary prevention, NNT and NNH may be similar — shared decision-making essential
— Time horizon: an 85-year-old may not live long enough to benefit from a therapy with NNT over 10 years
— Burden: polypharmacy, cost, side effects, dosing frequency
— Patient preferences for risk avoidance vs treatment burden
— Strong recommendation: large ARR, low NNH, patient-oriented outcome, applicable population
— Weak/conditional: small ARR, surrogate outcome, or borderline applicability
— Do not treat: trivial ARR, high NNH, or population mismatch
Step 3 management: For the classic vignette of an elderly patient with limited life expectancy and a new "significant" trial, calculate whether the time to benefit exceeds remaining life expectancy. For statins, intensive glycemic control, and cancer screening, time-to-benefit is often 5–10 years. Do not start therapies whose benefit accrues beyond the patient's expected survival.

— Tight A1c <6.5% statistically reduced microvascular surrogate outcomes
— ACCORD showed increased mortality with intensive control
— Clinical lesson: A1c target of ~7% for most adults; individualize 7.5–8% for elderly, frail, or limited life expectancy
— Statistically raised HDL, lowered triglycerides
— No reduction in MACE; increased adverse events
— Clinical lesson: surrogate (lipid panel) ≠ patient outcome; niacin abandoned for ASCVD prevention
— Lowered triglycerides "significantly"
— No overall cardiovascular benefit
— Small statistical reductions in cardiovascular events
— Offset by bleeding; NNT ≈ NNH
— Clinical lesson: USPSTF now recommends against initiating ASA for primary prevention in adults ≥60; selective use age 40–59 with shared decision-making
— Statistically AND clinically significant: ARR for MACE, HF hospitalization, renal endpoints with NNT 30–60 over 3–5 years
— Patient-oriented outcomes, not just A1c
— Now first-line add-ons in T2DM with ASCVD, HF, or CKD per ADA/KDIGO
Board pearl: When a trial's only positive finding is a surrogate (LDL, A1c, BP, bone density, ejection fraction) without hard-outcome confirmation, do not extrapolate to clinical benefit. History is littered with surrogate-positive drugs that harmed patients (flecainide for PVCs post-MI, hormone therapy for "cardioprotection," torcetrapib for HDL).

— Earlier observational data suggested mortality benefit
— RCTs showed no mortality or MI reduction vs optimal medical therapy in stable CAD
— Symptom benefit modest and partly placebo-driven (ORBITA used sham control)
— Clinical lesson: reserve revascularization in stable CAD for refractory symptoms or high-risk anatomy
— Statistically improved patency
— No clinical benefit on BP control or renal function vs medical therapy
— Small statistical mortality reduction in some analyses (NNT >700 to prevent one prostate-cancer death over 13 years)
— Substantial overdiagnosis and treatment harms
— USPSTF: shared decision-making age 55–69; against routine screening ≥70
— Statistical mortality reduction exists but ARR is small; harms (false positives, overdiagnosis) substantial
— USPSTF now recommends biennial screening starting age 40 — but emphasizes individualized discussion of magnitude
— Statistically and clinically significant ARR for CV events with SBP <120 vs <140 in high-risk non-diabetic patients (NNT ~61 over 3 years)
— Real practice-changing data — meets both bars
— Caveat: SPRINT BP measurement was unattended automated; standard office BP often runs higher
CCS pearl: On CCS cases involving screening or elective procedures, do not order interventions just because guidelines list them as "options." Document shared decision-making in counseling notes for PSA, lung cancer screening, AAA screening, and primary-prevention aspirin. Ordering without discussion can be marked down for inappropriate utilization.

— Most RCTs systematically exclude patients >75–80, those with significant CKD, hepatic impairment, dementia, or polypharmacy
— Mean trial age often 10–20 years younger than typical clinic patient
— External validity (generalizability) is weak
— Elderly patients have higher risk of dying from non-target causes
— A therapy reducing CV mortality by 25% RRR may yield negligible ARR in a 90-year-old with limited life expectancy
— Statins: ~2.5 years to mortality benefit
— Intensive glycemic control: ~8–10 years to microvascular benefit
— Mammography: ~10 years to mortality benefit
— Colonoscopy: ~10 years to mortality benefit
— If life expectancy < time-to-benefit, the patient absorbs harms without the upside
— Bleeding risk on anticoagulation rises sharply with age and CKD
— Hypoglycemia, falls, orthostasis, AKI risk all amplify in elderly
— NNH often falls below NNT in this population, flipping the risk-benefit calculus
— Trial doses may not be safe in CKD stage 4–5 or Child-Pugh B/C
— Drug-drug interactions multiply with polypharmacy
Step 3 management: For any new "significant" therapy in an elderly or multimorbid patient, explicitly estimate: (1) life expectancy (use ePrognosis), (2) time-to-benefit from trial follow-up, (3) NNT vs NNH adjusted for their risk profile. If life expectancy < time-to-benefit, or NNH < NNT, decline or deprescribe. This is the bread and butter of Step 3 geriatrics questions.

— Pregnant patients are categorically excluded from most RCTs
— Statistical findings in non-pregnant adults rarely transfer directly
— Drug categories rely on pharmacokinetic, registry, and observational data (e.g., warfarin teratogenicity; ACE inhibitor fetopathy)
— Clinical lesson: absence of statistically significant harm data is not evidence of safety; default to known-safe alternatives (labetalol, nifedipine, methyldopa for HTN)
— Many pediatric prescribing decisions extrapolate from adult RCTs
— Body-surface-area dosing and developmental pharmacology limit applicability
— Off-label use is common but should be evidence-graded, especially for psychiatric medications where suicidality signals (SSRIs in adolescents) appeared post-marketing
— Subgroup findings (e.g., BiDil in self-identified Black patients with HF) were statistically positive but reflect social rather than biological constructs
— Pharmacogenomic differences (CYP2C19 in clopidogrel response, HLA-B*57:01 in abacavir) are real but require individual testing, not racial proxy
— Women historically underrepresented in cardiovascular trials
— Statistical findings in mixed-sex trials may overrepresent male physiology
— Trial efficacy ≠ real-world effectiveness when adherence, access, and cost intervene
— A drug with NNT 25 in a trial may have NNT 100 in a population with 50% adherence
Key distinction: Efficacy (statistical performance in idealized trial conditions) versus effectiveness (real-world performance in heterogeneous populations with imperfect adherence and access). Step 3 questions about implementation, adherence, and disparities are testing whether you can distinguish these — and recognize that statistical efficacy alone does not guarantee population-level benefit.

— Adopting therapies based on small statistical effects exposes patients to side effects, cost, and pill burden without meaningful benefit
— Example: niacin add-on caused myopathy, hyperglycemia, GI effects with no MACE benefit
— Statistically "significant" screening yields (more cancers detected) can drive treatment of indolent disease (low-grade prostate cancer, DCIS, papillary thyroid microcarcinoma)
— Patient harms: surgery, radiation, anxiety, financial toxicity
— Chasing surrogate targets (LDL <55 in low-risk patients, A1c <6.5 in frail elderly) increases adverse events without outcome benefit
— Each "significant" drug added increases interactions, falls, cognitive effects, nonadherence
— Beers Criteria and STOPP/START guide deprescribing
— High-cost therapies with marginal clinical benefit drive out-of-pocket burden, medication nonadherence, and downstream worse outcomes
— Reversing recommendations ("we used to say everyone should take aspirin / hormones / niacin") erodes credibility
— Statistically negative trials underpublished; positive surrogate findings overemphasized
— Industry-funded trials more likely to report favorable conclusions
— Testing many hypotheses inflates false-positive findings (multiple comparisons problem)
— A trial with 20 secondary endpoints will likely produce one "significant" finding by chance alone
Board pearl: Reversal of medical practice (Prasad's "medical reversals") happens when initial statistical findings — usually on surrogates or in observational data — are overturned by definitive RCTs on hard outcomes. Classic examples: HRT for cardioprotection, antiarrhythmics post-MI (CAST), tight glycemic control in ICU (NICE-SUGAR), aggressive transfusion thresholds (TRICC).

— Novel mechanism without confirmatory trial
— Surrogate-only outcome
— Industry-sponsored with no independent replication
— Subgroup-driven positive finding
— Effect size borderline statistical (p 0.03–0.05) with wide CI
— Multiple smaller trials with mixed results
— Heterogeneity in study populations or interventions
— Look for Cochrane reviews, GRADE-rated guidelines, or USPSTF/specialty society statements
— Off-label prescribing for complex disease (oncology, rheumatology, transplant)
— Pharmacogenomic decisions
— Pregnancy-specific medication choices
— High-risk procedures with marginal trial benefit (e.g., revascularization in stable CAD)
— Hospital P&T committees vet formulary additions
— Pharmacy and clinical pharmacology consultation for narrow-therapeutic-index drugs
— IRB oversight if applying experimental therapies
— Report adverse events to FDA MedWatch
— Post-marketing data may flip risk-benefit (rofecoxib/Vioxx, rosiglitazone)
— High: multiple consistent RCTs, hard outcomes
— Moderate: single RCT or consistent observational
— Low: observational only, indirect comparisons
— Very low: case series, expert opinion
— Strong recommendations require at least moderate-quality evidence on patient-oriented outcomes
Step 3 management: When a stem presents a brand-new trial result, the correct answer is rarely "immediately change practice." More often: await guideline incorporation, confirmatory trials, or systematic review, especially if the outcome is a surrogate, the population narrow, or replication absent.

— Tiny effects become detectable with N in tens of thousands
— Common in registry analyses, EHR-based studies, large pharma trials
— Statistically robust changes in biomarkers that fail to translate
— LDL, A1c, BMD, BP, viral load, tumor size, ejection fraction — all can mislead in isolation
— MACE may be "positive" because of unplanned revascularization, not death/MI/stroke
— Always disaggregate components
— With 20 subgroups, expect ~1 to be "significant" by chance at α=0.05
— Not credible unless prespecified, biologically plausible, and confirmed
— Per-protocol analyses (only adherent patients) can inflate apparent efficacy
— ITT preserves randomization and reflects real-world adoption
— Performance and detection bias inflate effects on subjective outcomes
— Particularly problematic for pain, dyspnea, quality of life
— May capture early surrogate change but miss late harms
— Primary outcome quietly switched mid-trial; published outcome is the most favorable secondary
— Compare published paper to ClinicalTrials.gov registration to detect this
— Associated with more favorable conclusions even when data are similar
Key distinction: A study can be methodologically rigorous and statistically significant yet clinically irrelevant because of the endpoint chosen, not because of fraud. Conversely, a methodologically weak study can produce a clinically intriguing but unreliable signal. Sort the differential by endpoint type first, then methodology, then magnitude.

— At α=0.05, 1 in 20 truly null comparisons will be "significant" by chance
— Multiple testing without correction (Bonferroni, Holm, FDR) compounds this
— Underpowered trial finds "no significant difference"
— "Absence of evidence is not evidence of absence" — a wide CI consistent with meaningful effect should not be reported as "negative"
— Extreme baseline values trend toward the mean on retest, mimicking treatment effect in single-arm or pre-post studies
— Requires a control arm to disentangle
— Being observed alters behavior, inflating apparent intervention effect in quality improvement
— Subjective outcomes (pain, dyspnea, depression) respond substantially to placebo
— Sham-controlled trials (ORBITA, vertebroplasty) often show smaller true effects than open-label trials
— Observational associations (HRT and CV health, vitamin D and outcomes) often dissolve in RCTs
— Group-level correlations do not imply individual-level causation
— Sick patients use more of intervention X → X appears harmful, or vice versa
— Patients who take preventive medication adhere to other healthy behaviors, inflating apparent drug benefit in observational data
— Screening "improves survival" by detecting disease earlier (lead-time) or preferentially detecting slow-growing cases (length-time) without altering mortality
— Mortality, not survival, is the proper screening endpoint
Board pearl: When a stem describes an observational study showing strong association — even with tight CI and tiny p — the correct answer is rarely "change practice." Suspect confounding and demand an RCT before acting on observational signals.

— Statins post-ACS: NNT ~30 over 5 years for mortality — strong indication
— Beta-blocker post-MI with reduced EF: NNT ~25 for mortality — strong
— DAPT duration post-PCI: balance NNT for stent thrombosis vs NNH for bleeding
— SGLT2i in HFrEF (DAPA-HF, EMPEROR-Reduced): NNT ~20 for CV death/HF hospitalization — strong
— Reassess statins annually in patients >75 with life expectancy <5 years
— Tighten or loosen A1c targets with changing functional status
— Stop PPIs when GERD indication resolved
— Review benzodiazepines, anticholinergics, NSAIDs in elderly (Beers Criteria)
— "This medication reduces your chance of heart attack from 10% to 7.5% over 10 years" beats "25% relative risk reduction"
— Pictographs and natural frequencies improve numeracy
— For preference-sensitive decisions (PSA, lung CA screening, primary-prevention ASA, anticoagulation in borderline AF) document the conversation
— Just because a medication was started in hospital does not mean it should be continued forever
— Medication reconciliation at every transition of care
— Often clinically significant with NNT comparable to medications (exercise in HF, DASH diet in HTN, Mediterranean diet in ASCVD prevention)
— Frequently underemphasized despite strong evidence
Step 3 management: At each visit, ask three deprescribing questions: (1) Is the original indication still valid? (2) Does NNT still exceed NNH given current life expectancy and risks? (3) Does the patient still value this trade-off? If "no" to any, reduce or stop.

— Function, symptoms, quality of life, exacerbations, hospitalizations
— Do not chase surrogates beyond evidence-based targets (LDL, A1c, BP)
— Lipid recheck 4–12 weeks after statin initiation, then annually
— A1c every 3 months until at goal, then every 6 months
— BP per JNC/ACC guidance based on control
— INR for warfarin per protocol; renal/hepatic monitoring for relevant drugs
— Statin: myalgias, transaminases if symptomatic, glucose
— Metformin: B12 every 1–2 years on chronic therapy
— Amiodarone: TFTs, LFTs, CXR/PFTs per protocol
— Anticoagulants: bleeding screen, hemoglobin, renal function
— Guidelines update every few years; stay current via ACC, AHA, ADA, USPSTF, specialty societies
— Subscribe to systematic-review summaries (Cochrane, NEJM Journal Watch, AFP POEMs)
— Use natural frequencies: "Out of 100 people like you, 5 will have a heart attack in 10 years"
— Avoid bare relative risks
— Acknowledge uncertainty: "The benefit is real but modest"
— Home BP cuffs, glucose meters, weight logs in HF, peak flow in asthma — empower adherence and early detection
— Cardiac rehab post-MI: NNT ~10 for mortality over 5 years — strongly indicated, undersubscribed
— Pulmonary rehab in COPD: improves dyspnea, exercise tolerance — both statistically and clinically significant
CCS pearl: On CCS, schedule appropriate follow-up intervals matched to the medication and indication. Routine "follow up in 6 months" without specifying labs or symptom assessment may be flagged as low-quality care.

— Patients have the right to know absolute benefits and harms, not just relative
— Presenting only RRR ("25% reduction!") without ARR is ethically problematic — it inflates perceived benefit
— Step 3 vignette: a patient asks about a new drug after seeing an ad; counsel using absolute numbers and NNT/NNH
— Disclose industry funding when teaching or counseling
— Open Payments database lists physician-industry financial relationships
— Do not let pharma representatives drive prescribing
— Report serious adverse drug events to FDA MedWatch
— Report vaccine adverse events to VAERS
— Both are critical for detecting harms missed in pre-approval trials (where statistical power for rare events is inadequate)
— Medication errors at hospital discharge are common; reconcile every drug
— Avoid continuing inpatient-initiated PPIs, benzodiazepines, antipsychotics without indication
— Provide written discharge instructions with absolute risk language
— Equipoise required to randomize patients; once a therapy shows clear benefit on hard outcomes, continuing the placebo arm is unethical (data-safety monitoring boards stop trials early)
— Early stopping for benefit, however, inflates effect estimates and limits long-term safety data
— Underrepresented groups in trials → uncertain efficacy/safety in those populations
— Advocate for diverse enrollment; counsel patients honestly about evidence limitations applicable to them
— QI projects measuring "significant" changes still require ethical oversight when patient-level data and interventions are involved
Board pearl: Withholding the absolute numbers from a patient because "they wouldn't understand" violates the principle of informed consent. Use plain-language framing, but always disclose the actual magnitude of expected benefit and harm. Documenting this conversation protects both the patient and the clinician.

— SF-36: ~10 points
— Borg dyspnea: ~1 point
— VAS pain: ~10–13 mm on 100 mm scale
— 6-minute walk: ~30 m in COPD
— UPDRS motor: ~2.5–5 points
Step 3 management: Memorize three reflex questions for any "significant" study: (1) What's the ARR/NNT? (2) Is the outcome patient-oriented? (3) Does my patient match the trial population and baseline risk? If you cannot answer all three favorably, do not change management.

— Stem reports "25% relative risk reduction, p<0.001" for a new drug; baseline event rate is 2%
— Correct answer: ARR is 0.5%, NNT is 200 — counsel that absolute benefit is modest, especially given side effects and cost
— Stem describes a drug that "significantly lowers LDL/A1c/BMD/BP" with no hard-outcome data
— Correct answer: await confirmatory cardiovascular/fracture/mortality outcome trial; do not adopt
— Overall trial negative, but a subgroup (e.g., Black patients, women >65, diabetics) "significant"
— Correct answer: hypothesis-generating; not sufficient to change practice
— MACE significant, driven entirely by unplanned revascularization
— Correct answer: no clear effect on death/MI/stroke; weak evidence to adopt
— Small RCT, p=0.18, wide CI from HR 0.6–1.4
— Correct answer: insufficient evidence to conclude no effect; need larger trial
— Trial in 50–65-year-olds; patient is 88
— Correct answer: consider life expectancy vs time-to-benefit; often decline new therapy
— Cohort study "associates" a vitamin/supplement with reduced mortality
— Correct answer: confounding likely; RCT needed before recommending
— New screening test shows improved 5-year survival
— Correct answer: lead-time/length-time bias; demand mortality data and RCT evidence
— Promotional material highlights "p<0.001"
— Correct answer: ask for ARR/NNT, outcome type, applicability; do not change practice based on marketing
— Patient saw a TV ad
— Correct answer: shared decision-making with absolute risk framing
Key distinction: On Step 3, the most common correct answer in EBM stems is some form of "this finding does not justify changing management" — because the question is testing whether you can resist the seduction of a small p-value. Pick the option that demands ARR, hard outcomes, or applicability.

A statistically significant result is necessary but not sufficient for clinical action — what matters is the absolute magnitude of effect on patient-oriented outcomes in a population resembling your patient, weighed against absolute harms.
— ARR and NNT — is the absolute effect meaningful (not just relative)?
— Outcome type — patient-oriented (mortality, symptoms, function) vs surrogate (lab, imaging)?
— Applicability — does your patient match the trial population and baseline risk?
— Large p-value with wide CI ≠ "no effect" (absence of evidence ≠ evidence of absence)
— Small p-value with tiny ARR ≠ "important effect" (statistical ≠ clinical)
— Surrogate improvement ≠ patient benefit (LDL, A1c, BMD, BP, EF can mislead)
— Observational association ≠ causation (confounding lurks even with tight CIs)
— Compute or estimate ARR, NNT, NNH; compare to MCID and time-to-benefit vs life expectancy; counsel using absolute risk language; document shared decision-making for preference-sensitive choices; deprescribe when risk-benefit flips.
— Treat patients, not p-values. The trial answers a population-level question; you must translate it for one human being in front of you, integrating their risk, values, prognosis, and circumstances. That translation — not the p-value — is the practice of medicine.

