Biostatistics & Population Health
Bayes theorem and post-test probability calculation
— Pre-test odds = pre-test probability / (1 − pre-test probability)
— Convert odds back to probability: odds / (1 + odds)
— A test is positive but disease prevalence is low (e.g., screening asymptomatic patient) → high false-positive risk
— A test is negative but pre-test suspicion is high (e.g., classic angina with negative stress ECG) → disease still likely; pursue further testing
— Choosing which test to order next based on whether you need to rule in (high specificity, high LR+) vs rule out (high sensitivity, low LR−)
— Interpreting screening program results (mammography, PSA, HIV, D-dimer)
— LR+ = sensitivity / (1 − specificity)
— LR− = (1 − sensitivity) / specificity
— LR 2 → +15% absolute probability
— LR 5 → +30%
— LR 10 → +45%
— LR 0.5 → −15%; LR 0.1 → −45%
Board pearl: A "positive test" never equals "has the disease." Step 3 wants you to integrate prevalence + test performance before acting—especially before invasive confirmation, treatment initiation, or patient disclosure of a presumptive diagnosis.

— "Asymptomatic 25-year-old with a positive ELISA for HIV" → low prevalence → high false-positive rate → confirm with Western blot/HIV-1/2 immunoassay differentiation
— "55-year-old with typical chest pain and a negative exercise stress test" → high pre-test probability of CAD → negative test does not rule out; proceed to coronary CTA or cath
— "Low-risk patient with a positive D-dimer" → low specificity → does not establish PE; but in a low-Wells patient with negative D-dimer → effectively rules out PE
— "Routine screening test came back positive" → emphasize confirmatory testing, repeat testing, or counseling about predictive value
— Demographics (age, sex, race-stratified prevalence)
— Exposure history (TB contacts, travel, sexual history, IVDU)
— Symptom typicality (typical vs atypical angina; classic vs atypical migraine)
— Prior test results and trajectory (rising troponin, serial imaging)
— Wells score (PE, DVT)
— HEART score (chest pain in ED)
— Centor/McIsaac (strep pharyngitis)
— CURB-65 (pneumonia severity, not diagnostic but prognostic)
— FRAX (osteoporotic fracture risk)
Key distinction: Sensitivity and specificity are properties of the test and do not change with prevalence. PPV and NPV change with prevalence—this is the fulcrum of nearly every Bayesian Step 3 question. When prevalence drops, PPV drops; NPV rises.

— Rows = test result (positive/negative)
— Columns = disease status (present/absent)
— Cells: True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN)
— Sensitivity = TP / (TP + FN) — among diseased, how many test positive
— Specificity = TN / (TN + FP) — among non-diseased, how many test negative
— PPV = TP / (TP + FP) — among test-positive, how many have disease
— NPV = TN / (TN + FN) — among test-negative, how many lack disease
— SnNout: A highly Sensitive test, when Negative, rules out disease.
— SpPin: A highly Specific test, when Positive, rules in disease.
— Shift the cutoff lower (more inclusive) → sensitivity ↑, specificity ↓, more FPs
— Shift the cutoff higher (more stringent) → specificity ↑, sensitivity ↓, more FNs
— ROC curve plots sensitivity vs (1 − specificity); AUC = discrimination quality (0.5 = chance, 1.0 = perfect)
Board pearl: If a question gives sensitivity, specificity, and prevalence and asks for PPV, the fastest path is to populate a 2×2 using a hypothetical population of 1,000 or 10,000, multiply through, then read PPV directly. This avoids algebraic missteps under time pressure.

— 1. Estimate pre-test probability (clinical judgment or validated score)
— 2. Convert to pre-test odds: P / (1 − P)
— 3. Identify test's LR+ (if positive result) or LR− (if negative result)
— 4. Post-test odds = pre-test odds × LR
— 5. Convert back to probability: odds / (1 + odds)
— 55-year-old man, atypical angina → pre-test probability of CAD ≈ 50% (odds = 1)
— Exercise stress test positive; LR+ ≈ 3
— Post-test odds = 1 × 3 = 3 → post-test probability = 3/4 = 75%
— Action: proceed to coronary CTA or invasive angiography
— Low Wells score → pre-test probability ≈ 5% (odds = 0.053)
— D-dimer negative; LR− ≈ 0.1
— Post-test odds = 0.053 × 0.1 = 0.0053 → post-test probability ≈ 0.5% → PE excluded
— Asymptomatic low-risk patient; prevalence ≈ 0.1% (odds ≈ 0.001)
— Fourth-generation immunoassay positive; LR+ ≈ 250 (very high specificity)
— Post-test odds ≈ 0.25 → post-test probability ≈ 20% → mandatory confirmatory differentiation assay
— Using sensitivity in place of LR
— Forgetting to convert between odds and probability
— Assuming a positive test in low-prevalence populations is diagnostic
Step 3 management: When the post-test probability lies in the "testing threshold zone" (neither rules in nor rules out), order an additional independent test. When above the treatment threshold, treat. When below the testing threshold, stop—further testing causes more harm than benefit.

— Valid only if tests are conditionally independent given disease status
— Example: HIV screening immunoassay (sensitive) → HIV-1/2 differentiation assay (specific) → HIV RNA (if discordant)
— ↑ Sensitivity, ↓ specificity → useful in emergencies (acute MI: ECG + troponin + bedside echo)
— These thresholds are disease- and treatment-specific (e.g., low threshold to treat suspected meningitis with empiric antibiotics; high threshold to start chemotherapy).
— LR+ >10 or LR− <0.1 → large, often conclusive shift
— LR+ 5–10 / LR− 0.1–0.2 → moderate shift
— LR+ 2–5 / LR− 0.2–0.5 → small but sometimes important
— LR+ 1–2 / LR− 0.5–1 → minimal; test rarely useful
Key distinction: Two tests measuring the same physiologic feature (e.g., two troponin assays) violate conditional independence; combining them does not multiply diagnostic gain. Combine tests that exploit different mechanisms (clinical score + biomarker + imaging) for maximum Bayesian leverage.

— Screening (low prevalence, asymptomatic) → prioritize high sensitivity and high NPV; accept some false positives that will be confirmed later
— Confirmation (after a positive screen) → prioritize high specificity and high PPV
— Ruling out dangerous diagnoses (PE, MI, meningitis) → high sensitivity test, often combined with low pre-test probability
— Ruling in before invasive therapy → high specificity test or gold standard
— HIV: 4th-gen Ag/Ab immunoassay (screen) → HIV-1/2 antibody differentiation (confirm) → HIV RNA (resolve discordance/acute infection)
— Hepatitis C: anti-HCV antibody (screen) → HCV RNA (confirm active infection)
— Strep pharyngitis: rapid antigen (specific) → if negative in child, throat culture (sensitive backup)
— Lyme: ELISA (sensitive screen) → Western blot or modified two-tier (specific confirmation)
— Syphilis: traditional (RPR → FTA-ABS) vs reverse (treponemal → RPR) algorithms
— In a high-prevalence ED population with chest pain, a negative high-sensitivity troponin powerfully rules out MI (high NPV preserved because the test's sensitivity is near-perfect)
— In a low-prevalence screening population, a positive test is often a false positive
Step 3 management: In a low-pretest-probability patient with a positive screening test, the correct next step is almost always confirmatory testing or repeat testing, not treatment, biopsy, or definitive disclosure of diagnosis. This pattern repeats across HIV, hepatitis, syphilis, autoantibody panels, and tumor markers.

— Whether to initiate therapy depends on post-test probability of disease × magnitude of treatment benefit − magnitude of harm
— This is operationalized via NNT (number needed to treat) and NNH (number needed to harm)
— Anticoagulation for AF: CHA₂DS₂-VASc establishes annual stroke risk (pre-test probability of event); ABRR from DOAC determines whether to treat. Score ≥2 (men) / ≥3 (women) → treat.
— Statin initiation: ASCVD 10-year risk ≥7.5% with shared decision-making, ≥20% with strong recommendation
— Empiric antibiotics: low testing threshold for suspected meningitis, sepsis, necrotizing fasciitis—start treatment before confirmatory results because cost of missed diagnosis dwarfs cost of overtreatment
— Lower NNT = more efficient therapy
— Always paired with NNH for harm assessment
— RR or HR can look impressive in low-prevalence settings while ARR remains tiny
— Always anchor decisions to absolute numbers, especially for screening interventions
Board pearl: A "50% relative risk reduction" applied to a 0.2% baseline risk yields only a 0.1% absolute reduction—NNT = 1,000. Step 3 stems test whether you can recognize when a statistically significant result is clinically trivial, which is a downstream consequence of Bayesian post-test probability being too low to justify intervention.

— Hypothetical 10,000 patients; prevalence 1% → 100 diseased, 9,900 non-diseased
— Sens 99%, Spec 99% → TP = 99, FN = 1, FP = 99, TN = 9,801
— PPV = 99 / (99 + 99) = 50%
— Teaching point: even with 99/99 test, low prevalence drives PPV to ~50%
— Pre-test 5% (Wells low PE); high-sens D-dimer (sens 98%, spec 50%)
— Per 1,000 patients: 50 PE, 950 no PE; TP = 49, FN = 1, FP = 475, TN = 475
— NPV = 475 / (475 + 1) = 99.8% → negative D-dimer effectively rules out PE
— But PPV = 49 / 524 = 9.4% → positive D-dimer requires imaging
— HIV: prevalence 0.1% in low-risk patient; immunoassay LR+ ≈ 250 → post-test ≈ 20% → confirmatory differentiation LR+ ≈ 4,000 if positive → post-test >99%
— Pre-test probability 30%, LR+ = 10 → estimated post-test ≈ 80% (using +45% rule from LR 10)
— Test with LR+ = 1.2 cannot meaningfully change management; do not order
— When a stem reports a "positive" tumor marker (CA-125, PSA, CEA) without symptoms, the next order is repeat measurement or imaging, not biopsy or surgery
— A "positive" ANA at 1:40 in an asymptomatic patient → low PPV → reassurance, not rheumatology workup
CCS pearl: Before ordering a confirmatory invasive study (biopsy, angiography, LP), document the pre-test probability rationale in your note. Order sequence on CCS: clinical assessment → risk score → noninvasive test → invasive confirmation only when post-test probability sits between testing and treatment thresholds.

— CAD prevalence rises with age → same stress test result has higher PPV in a 70-year-old than a 30-year-old
— Cancer prevalence rises with age → positive screening tests are more often true positives in older adults
— But competing comorbidities lower treatment benefit (e.g., screening colonoscopy after age 75 individualized; stop at 85)
— D-dimer: age-adjusted cutoff (age × 10 µg/L for patients >50) preserves specificity in elderly while maintaining sensitivity for PE
— Troponin: chronic kidney disease causes baseline elevation; specificity for acute MI ↓ → rely on delta troponin (kinetic change) rather than single value
— BNP/NT-proBNP: rises with age and CKD; use higher cutoffs in renal impairment
— Creatinine-based eGFR: less reliable in low-muscle-mass elderly; consider cystatin C
— Coagulation tests (INR) lose discriminatory value for warfarin monitoring vs hepatic synthetic dysfunction
— Tumor markers (AFP) elevated in cirrhosis without HCC → use trajectory + imaging
— False-positive urine drug screens common (decongestants → amphetamines; quinolones → opiates) → confirm with GC-MS in low-prevalence settings
Step 3 management: In an elderly patient with a borderline-positive biomarker (troponin, D-dimer, BNP), the highest-yield next step is serial measurement to assess kinetics rather than reacting to an isolated value. Bayesian inputs include both prevalence (higher) and altered test specificity (lower), and only trend resolves the ambiguity.

— D-dimer rises physiologically → specificity for PE ↓; use CT pulmonary angiography or V/Q based on chest X-ray
— TSH reference ranges shift trimester-specific
— Glucose tolerance: 1-hour 50g screen (sensitive) → 3-hour 100g confirmatory (specific) for GDM
— Quad screen / cell-free DNA: cfDNA has high sensitivity and specificity for trisomy 21, but in low-risk populations PPV remains ~40–90% depending on age; always confirm with diagnostic CVS/amnio before irreversible decisions
— Designed to be highly sensitive at the cost of specificity → most positive screens are false positives
— Confirmatory testing mandatory before disclosing diagnosis or initiating treatment (e.g., PKU, congenital hypothyroidism, CF)
— Strep pharyngitis: high prevalence ages 5–15 → Centor criteria + rapid antigen; negative rapid test in child → throat culture backup (lower test sensitivity than adults)
— UTI in febrile infant: pre-test probability 5–10% → urinalysis screens, urine culture confirms
— Lead-time bias: earlier detection lengthens apparent survival without extending life
— Length-time bias: screening preferentially detects slow-growing tumors
— Overdiagnosis: detection of disease that would never have become clinically relevant (low-grade prostate cancer, DCIS)
Board pearl: A pregnant patient with suspected PE requires a Bayesian shift: start with bilateral lower extremity ultrasound (no radiation; if positive, treat without further imaging), then chest X-ray to guide between V/Q and CTPA. Sensitivity, specificity, and harm profile all shift with pregnancy.

— Base-rate neglect: ignoring prevalence and treating sens/spec as if they were PPV/NPV. Classic: "99% accurate" test in 1% prevalence → 50% PPV, not 99%.
— Anchoring: failing to update probability when new information arrives (e.g., persisting with initial diagnosis despite contradictory test)
— Confirmation bias: ordering tests that confirm rather than refute working diagnosis
— Availability heuristic: overestimating prevalence of recently seen diagnoses
— Premature closure: stopping the workup when post-test probability is still in the indeterminate zone
— False-positive cancer screening → unnecessary biopsies, anxiety, complications
— Unnecessary anticoagulation in low-probability PE → bleeding
— Antibiotic overuse for viral pharyngitis with positive low-PPV testing → C. difficile, resistance
— Overdiagnosis of thyroid cancer from incidental imaging → unnecessary thyroidectomy
Key distinction: The error of treating a positive screen as a diagnosis is base-rate neglect; the error of clinging to a working diagnosis despite a negative confirmatory test is anchoring. Both appear as wrong-answer choices in Step 3 stems—identify which cognitive trap the distractor represents.

— Discordant test results (e.g., positive HIV immunoassay + negative differentiation assay) → resolve with HIV RNA; consult ID
— Indeterminate biopsy or cytology (Bethesda III thyroid, atypical breast lesion) → multidisciplinary tumor board, molecular testing
— Persistent indeterminate post-test probability despite sequential testing → expert consultation, advanced imaging, tissue diagnosis
— Suspected sepsis with borderline lactate / qSOFA → treat empirically while awaiting cultures; do not wait for confirmation
— Suspected meningitis → empiric antibiotics + dexamethasone before LP if delayed; CT only if focal deficits/altered mental status/papilledema
— Suspected ACS with negative initial troponin but ongoing typical pain → admit for serial troponins + telemetry; do not discharge
— Genetic counseling for indeterminate hereditary cancer panels
— Infectious disease for discordant HIV/HCV/syphilis algorithms
— Pathology second opinion for ambiguous histology
— Radiology for incidentalomas requiring management algorithm (Fleischner, ACR LI-RADS)
— Document pre-test reasoning, test rationale, and post-test interpretation in the chart
— Communicate uncertainty to patients explicitly; avoid premature labeling
CCS pearl: When a CCS case presents an indeterminate or borderline result, the high-value orders are usually (1) repeat the test or order a complementary independent test, (2) document pre-test probability and rationale, and (3) consult the relevant specialist rather than committing to a definitive intervention based on a single ambiguous data point.

— Sensitivity: among diseased, fraction testing positive (test property)
— PPV: among test-positive, fraction with disease (depends on prevalence)
— LR: ratio of probabilities of a test result given disease status; used for diagnosis
— OR: ratio of odds of exposure between cases and controls; used for case-control studies
— RR: ratio of risks in cohort/RCT
— OR ≈ RR when outcome is rare (<10%); diverges when outcome is common
— Instantaneous relative rate over time; used in survival analysis (Cox regression)
— Prevalence: population-level disease frequency
— Pre-test probability: individualized estimate incorporating prevalence + patient features
Board pearl: When a Step 3 question describes an "OR of 3.0" in a case-control study of a common outcome, the OR overestimates the RR; when the outcome is rare, OR ≈ RR. Bayesian post-test probability calculations require likelihood ratios, not odds ratios—these are not interchangeable despite the shared word "odds."

— Referral bias: patients sent to specialists have higher prevalence than primary care
— Workup bias: only test-positive patients get the gold standard
— Adrenal incidentaloma <4 cm, non-functional → observe; size + functional workup drives action, not the mere finding
— Pulmonary nodule: Fleischner criteria integrate pre-test risk (smoking, age, size) before deciding follow-up vs biopsy
Key distinction: Regression to the mean explains why an abnormal lab often normalizes on repeat without intervention; Bayesian updating explains why an isolated positive result in a low-prevalence setting is probably a false positive. Both lead to the same Step 3 answer—repeat the test—but for distinct conceptual reasons. Knowing which mechanism applies clarifies counseling.

— Colonoscopy: every 10 years starting age 45 (USPSTF) until 75; individualize 76–85; stop after 85
— Mammography: biennial 50–74 (USPSTF) or 40–74 with shared decision-making; stop based on life expectancy <10 years
— Cervical cancer: Pap every 3 yr (21–29); Pap + HPV co-test every 5 yr or HPV alone every 5 yr (30–65); stop at 65 if adequate prior screening
— Lung cancer LDCT: annual ages 50–80 with ≥20 pack-year history, currently smoking or quit <15 years
— AAA: one-time US ages 65–75 in men who ever smoked
— Disease incidence (rising pre-test probability over time)
— Test sensitivity and lead-time
— Harms of false positives, overdiagnosis, and procedures
— A1c monitoring frequency: every 3 months if not at goal, every 6 months if stable
— INR monitoring in warfarin: weekly initially, monthly when stable
— LFTs after statin initiation: not routine unless symptomatic
— Reconcile incidental imaging findings with explicit follow-up plan to prevent loss to follow-up (lung nodules, adrenal incidentalomas)
— Document deferred testing and why
Step 3 management: Screening cessation is a Bayesian decision: when remaining life expectancy is shorter than the time required to derive benefit, continued screening generates more false-positive harm than true-positive benefit. Document the shared decision in the chart.

— Translate post-test probability into natural frequencies: "Of 100 people like you with a positive result, about 5 actually have the disease"
— Avoid stating "you have the disease" until algorithm completes (HIV, cancer, autoimmune)
— Thyroid nodule <1 cm low-risk features: surveillance US 12–24 months
— Pulmonary nodule per Fleischner: 4–6 mm low-risk → optional 12-month CT; 6–8 mm → CT 6–12 months
— PSA mildly elevated: repeat in 6–12 weeks; rule out prostatitis, recent ejaculation, instrumentation
— Borderline A1c 5.7–6.4%: lifestyle intervention + annual reassessment
— Pre-test probability discussion: "Based on your risk factors, before testing your chance of having X is roughly Y%"
— Discuss what a positive vs negative result will and will not change
— Address overdiagnosis explicitly for prostate, thyroid, and breast screening
— Tobacco cessation: lowers CAD, COPD, multiple cancers
— Weight loss: lowers diabetes, OSA, cancer risk
— Vaccination: shifts infectious disease prevalence
Board pearl: Communicating probability in natural frequencies ("3 out of 100") rather than percentages or probabilities improves patient comprehension and shared decision-making quality. Step 3 expects you to recognize this as best-practice counseling, particularly around genetic testing, cancer screening, and prenatal screening.

— Patients should understand pre-test probability, test characteristics, and downstream implications before consent—especially genetic, HIV, and prenatal screening
— In many states, HIV testing requires verbal informed consent and pre-test counseling; opt-out screening is standard but consent still required
— Premature disclosure based on a screening test alone (HIV ELISA, newborn screen) without confirmatory testing is a patient safety event; institutional protocols mandate algorithm completion before formal diagnosis
— Pre-test counseling on PPV, implications for family members, and GINA protections required
— Variant of uncertain significance (VUS) disclosure must explain low PPV for clinical action
— Positive HIV, syphilis, tuberculosis, gonorrhea, and certain other infections must be reported to public health authorities regardless of patient preference; partner notification protocols apply
— Pending test results at discharge are a leading patient safety hazard; explicit handoff to PCP with documented follow-up plan is required
— Incidentaloma management plan (e.g., pulmonary nodule on trauma CT) must be communicated in discharge summary; failure to do so has been the basis of malpractice claims
— Acting on a single low-PPV positive without confirmation can constitute negligence
— Equally, dismissing a high-pretest-probability patient with a single negative test (e.g., classic angina + negative stress test) can constitute negligence
Step 3 management: A positive newborn screening result requires confirmatory testing before disclosure of diagnosis, urgent coordination with metabolic specialty, and parental counseling that emphasizes the high false-positive rate intrinsic to high-sensitivity screening. Premature parental disclosure of definitive disease is both a safety and ethical violation.

Board pearl: The single highest-yield Step 3 takeaway: a positive test in a low-prevalence population is more likely a false positive than a true positive, regardless of test sensitivity and specificity. This drives the "next step" answer for nearly every screening-positive stem on the exam.

— Given sens, spec, prevalence → build 2×2 with hypothetical 1,000 or 10,000 → compute
— Trap distractor: the sensitivity value itself
— HIV ELISA positive in low-risk patient → next step = confirmatory differentiation assay, not disclose diagnosis, not initiate ART
— Positive newborn screen → confirmatory testing, specialty referral
— Typical angina + negative exercise ECG → coronary CTA or cath
— Classic PE story + low-sensitivity D-dimer assay → CTPA regardless
— Routine ANA in patient with vague fatigue → low pre-test probability → defer testing
— Routine PSA in 80-year-old with limited life expectancy → defer
— Pre-test probability + LR given → apply rule-of-thumb or formal calculation
— Anchoring, base-rate neglect, premature closure, confirmation bias
— Life expectancy, prior negative screens, USPSTF age cutoffs
— Compute from ARR; compare to clinically meaningful threshold
— HIV immunoassay positive, differentiation negative → HIV RNA
— Single mildly elevated BP, A1c, or LDL → repeat before action
Step 3 management: When in doubt on a Bayesian stem, the answer is most often (1) confirmatory or repeat testing, (2) estimate pre-test probability first, or (3) base-rate neglect. Three distractors and a correct answer almost always map to these three themes.

Bayes theorem in clinical medicine means that every test result must be interpreted in light of pre-test probability, because post-test probability—and therefore the right next action—depends on prevalence and likelihood ratios, not on sensitivity and specificity alone.
Board pearl: Step 3 rewards the clinician who explicitly estimates pre-test probability before ordering a test, interprets results probabilistically rather than dichotomously, and recognizes that "repeat or confirm the test" is the safest answer whenever a result and the pre-test probability disagree.

