Biostatistics & Population Health
Sensitivity vs specificity: tradeoffs at different cutoff points
— Lower the cutoff → more people called "positive" → Sn ↑, Sp ↓ (catches more disease, more false positives)
— Raise the cutoff → fewer called "positive" → Sn ↓, Sp ↑ (misses disease, but positives are real)
— Screening a low-prevalence population → want high Sn (rule-out, minimize missed disease) → set a low cutoff
— Confirming disease before a morbid intervention → want high Sp (rule-in, minimize false positives) → set a high cutoff
— Sequential testing: sensitive test first to screen, specific test second to confirm (HIV ELISA → HIV-1/2 differentiation immunoassay; ANA → anti-dsDNA/Smith)
— SnNout: a Sensitive test, when Negative, rules OUT
— SpPin: a Specific test, when Positive, rules IN

— "Investigators lower the threshold for a positive troponin from 0.04 to 0.02 ng/mL…" → expect more positives, ↑Sn, ↓Sp
— "An age-adjusted D-dimer (age × 10 in patients >50) is used instead of fixed 500…" → effectively raises cutoff in elderly → ↑Sp, modest ↓Sn, fewer unnecessary CTPAs
— "PSA cutoff changed from 4.0 to 2.5 ng/mL for biopsy referral…" → ↑Sn, ↓Sp, more biopsies, more overdiagnosis
— "HbA1c diagnostic cutoff lowered from 6.5% to 6.0%…" → captures more prediabetes-as-diabetes, ↑Sn, ↓Sp
— "Screening asymptomatic patients" → low pretest probability → emphasize Sn and NPV
— "Confirming disease before chemotherapy/surgery/anticoagulation" → emphasize Sp and PPV
— "ED rule-out of PE/ACS" → sensitive cutoff (HEART pathway, hs-troponin 99th percentile)
— "Patient wishes to avoid unnecessary biopsy" → favor higher Sp cutoff
— "Patient is anxious about missing cancer" → favor higher Sn cutoff
— Shared decision-making (PSA, low-dose CT lung screening) is itself a cutoff-tradeoff conversation

```
Disease+ Disease−
Test+ TP FP
Test− FN TN
```
— Sensitivity = TP / (TP + FN) → among diseased, fraction correctly flagged
— Specificity = TN / (TN + FP) → among healthy, fraction correctly cleared
— PPV = TP / (TP + FP) → among test-positives, fraction truly diseased
— NPV = TN / (TN + FN) → among test-negatives, fraction truly healthy
— LR+ = Sn / (1 − Sp); LR− = (1 − Sn) / Sp
— Accuracy = (TP + TN) / total
— Sn and Sp are read down the disease columns → they don't change with prevalence, only with the test/cutoff
— PPV and NPV are read across the test rows → they DO change with prevalence
— X-axis = 1 − Sp (false positive rate); Y-axis = Sn (true positive rate)
— Each point on the curve = one cutoff
— Upper-left corner = perfect test; diagonal = useless (coin flip)
— AUC (area under curve): 0.5 useless, 0.7–0.8 fair, 0.8–0.9 good, >0.9 excellent
— Moving along a single ROC curve = moving the cutoff; comparing two curves = comparing two tests

— TP = 95, FN = 5 → Sn = 95%
— TN = 540, FP = 360 → Sp = 60%
— PPV = 95/455 = 21%; NPV = 540/545 = 99%
— Clinical use: rule out PE in low-pretest-probability patients (Wells ≤4) — a negative result reliably excludes PE
— TP = 80, FN = 20 → Sn = 80% (↓)
— TN = 810, FP = 90 → Sp = 90% (↑)
— PPV = 80/170 = 47% (↑); NPV = 810/830 = 98% (↓ slightly)
— Clinical use: better at confirming disease, but misses 20 PEs — unacceptable for ED rule-out
— Preserves Sn near 95% in younger patients, ↑Sp in elderly (where baseline D-dimer rises with age) → fewer unnecessary CTPAs, validated in ADJUST-PE
— Step 1: low Sn risk score (Wells, PERC) — actually PERC has high Sn → rules out
— Step 2: sensitive lab (D-dimer at low cutoff) → rules out
— Step 3: specific imaging (CTPA) → rules in
— Each step trades off harm of missed disease vs harm of overtesting (contrast nephropathy, radiation, incidentalomas)

— Curve hugging upper-left = excellent discrimination
— Curve on 45° diagonal = no better than chance (AUC = 0.5)
— A point on the curve is a specific cutoff; the curve itself is threshold-independent
— AUC (c-statistic) summarizes overall test performance independent of cutoff
— Screening (low prevalence, treatable disease, low-cost confirmatory test available) → move up and right on curve → ↑Sn (e.g., HIV ELISA, mammography BI-RADS 0–3 threshold)
— Confirmation (before toxic therapy, irreversible procedure) → move down and left → ↑Sp (e.g., HIV confirmatory immunoassay, tissue biopsy)
— Balanced (Youden) → tangent of slope 1 to the curve
— Test with higher AUC is generally better — but if curves cross, the "better" test depends on the operating region
— Example: Test A has higher Sn at low cutoffs (better for screening); Test B has higher Sp at high cutoffs (better for confirmation) — neither dominates globally
— Posttest odds = pretest odds × LR
— LR+ > 10 or LR− < 0.1 → large, often diagnostic shifts
— LR+ 5–10 / LR− 0.1–0.2 → moderate shifts
— LR+ 1–2 / LR− 0.5–1 → minimal, test was not useful

— Cost of a false negative (missed disease): morbidity, mortality, legal exposure, lost treatment window
— Cost of a false positive: unnecessary procedures, anxiety, downstream harms, overdiagnosis, resource waste
— Optimal cutoff minimizes weighted total cost, not raw error count
— Acute MI: hs-troponin 99th percentile (very low cutoff) — Sn ~99%, accepts many false positives sorted out by serial troponins and clinical context
— PE in ED: D-dimer 500 ng/mL — Sn ~95–98%
— Bacterial meningitis: low threshold for LP and empiric antibiotics
— Neonatal sepsis: low threshold for full workup and empiric coverage
— HIV screening: 4th-generation Ag/Ab combo — Sn >99%
— Cancer diagnosis before chemotherapy → tissue biopsy (Sp ~100%)
— Brain death determination → strict, specific criteria
— Confirmatory HIV testing before disclosure and ART initiation
— Genetic disease before life-altering decisions (BRCA, Huntington)
— Low-prevalence screening (e.g., general-population PSA, mammography in 40s): even with good Sn/Sp, PPV is low, and most positives are false → drives shared decision-making and USPSTF "C" recommendations
— High-prevalence confirmation (symptomatic patient, abnormal screen): PPV rises naturally; specific tests now yield trustworthy positives

— ASCVD 10-yr risk ≥7.5% triggers statin discussion; ≥20% triggers high-intensity statin
— Lowering the cutoff (e.g., to 5%) → ↑Sn for preventing events, ↓Sp → more people on statins, more side effects, lower NNT efficiency
— ↑Sn for cardiovascular risk identification, ↓Sp → ~30 million more "hypertensive" Americans overnight; classic example of cutoff-driven epidemiologic shift
— Higher cutoff: fewer bleeds, more strokes; lower cutoff: opposite

— HIV: 4th-gen Ag/Ab immunoassay (Sn >99%) → if positive, HIV-1/2 differentiation immunoassay (Sp ~100%) → if discordant, HIV-1 NAT
— Syphilis (reverse algorithm): treponemal EIA (sensitive) → RPR (specific for activity) → if discordant, second treponemal (TP-PA)
— Hepatitis C: anti-HCV antibody (Sn) → HCV RNA PCR (Sp, confirms active infection)
— Lupus: ANA (Sn ~95%) → anti-dsDNA, anti-Smith (Sp ~95%)
— PE: Wells/PERC → D-dimer → CTPA
— ACS: ECG + hs-troponin (Sn) → coronary angiography (Sp/definitive)
— Breast cancer: mammography → diagnostic mammo/US → core biopsy
— Sensitive first test with negative result → very low posttest probability (NPV high in low-prevalence population) → stop
— Positive sensitive test → posttest probability now in moderate range → specific second test now has acceptable PPV (prevalence in this enriched group is high)
— This is Bayes' theorem operationalized: each test updates pretest → posttest probability

— Rises physiologically with age, inflammation, malignancy, pregnancy
— Fixed 500 ng/mL cutoff in an 80-year-old has Sp ~10–20% → enormous false-positive rate, unnecessary CTPAs
— Age-adjusted cutoff (age × 10 for age >50) restores Sp without meaningful Sn loss — ADJUST-PE, YEARS criteria
— High-sensitivity troponin baseline rises with age, CKD, HF
— Sex-specific 99th percentile cutoffs (lower for women) — without these, women's MIs are underdiagnosed (↓Sn in women under unified cutoffs)
— In CKD, chronic troponin elevation reduces Sp → emphasize delta (change) rather than absolute value
— Rises with age, falls with obesity, rises with renal dysfunction
— Age-stratified NT-proBNP cutoffs: <450 (<50 yo), <900 (50–75), <1800 (>75) for ruling out acute HF
— Underestimates true GFR in low-muscle-mass elderly; cystatin C cutoffs more accurate
— Drug dosing cutoffs (DOACs, metformin, gabapentin) hinge on these — wrong cutoff → toxicity or underdosing

— Rises physiologically each trimester; fixed 500 ng/mL has near-zero Sp by third trimester
— YEARS algorithm adapted for pregnancy uses pregnancy-specific cutoffs to avoid empirical CTPA/V-Q
— Trimester-specific reference ranges (lower upper limit ~2.5–4.0 mIU/L in T1) because hCG cross-stimulates TSH receptor
— Using non-pregnant cutoff (4.5) misses subclinical hypothyroidism with fetal implications
— Vital sign cutoffs (HR, RR, BP) are age-dependent — adult "tachycardia at 100" is normal for a toddler
— Bilirubin nomograms (Bhutani) use hour-specific cutoffs for phototherapy/exchange transfusion thresholds
— Growth chart percentiles are inherently cutoff decisions (e.g., <3rd, >97th)
— Lead level "action" cutoffs were lowered (10 → 5 → 3.5 µg/dL) as evidence of harm at lower levels accumulated — classic Sn ↑, Sp ↓ public health shift

— Anxiety, labeling, insurance/employment discrimination
— Cascade of confirmatory testing (radiation, contrast nephropathy, biopsy bleeding, incidentalomas)
— Unnecessary treatment (overdiagnosis of indolent cancer → surgery, RT, ADT side effects)
— Resource diversion from true-positive patients
— Examples: PSA screening → prostate biopsy → sepsis, incontinence, ED; mammography → benign biopsy; CT lung screen → incidental nodule workup
— Missed treatment window (MI not diagnosed → death; sepsis not flagged → shock)
— Disease progression (cancer detected at higher stage)
— Continued transmission (missed HIV, TB, syphilis)
— False reassurance and delayed re-presentation
— Medicolegal exposure — missed diagnosis is the #1 source of outpatient malpractice claims in primary care

— Discordance between clinical impression and test result → trust the higher pretest probability and pursue further testing rather than accepting a "negative"
— Borderline value near cutoff (e.g., D-dimer 510, troponin 0.05) → repeat, trend, or use higher-Sp confirmatory test
— Multiple low-Sp positives in low-prevalence patient → most are likely false → recalibrate, don't anchor
— High-pretest-probability PE with negative D-dimer → proceed to CTPA anyway (the test was misused at low pretest probability; high pretest probability means even a negative sensitive test leaves significant residual risk)
— Symptomatic patient with negative screening test → direct diagnostic test (e.g., symptomatic breast lump with normal mammogram → US ± biopsy)
— Positive screening test in low-prevalence population → high probability of false positive → confirmatory specific test before treatment
— Discrepant HIV screen and confirmation → HIV-1 NAT to resolve
— Pathology/lab medicine when assay reliability, cutoff interpretation, or interfering substances are at issue (heterophile antibodies, biotin interference with troponin/TSH, hemolysis)
— Specialty when sequential testing has not resolved (oncology for indeterminate nodule, cardiology for ambiguous troponin trajectory)
— Ethics when overdiagnosis enters a values-laden decision (early-stage prostate cancer in 80-year-old)
— Inpatient: prevalence is higher → PPV of positive tests is higher → cutoffs are often used more aggressively (lower thresholds for action)
— Outpatient screening: prevalence is lower → PPV is lower → high-Sn cutoffs followed by confirmation are mandatory

— Sn answers: "Of those WITH disease, what fraction does the test catch?" → property of test + cutoff, prevalence-independent
— PPV answers: "Of those who TEST positive, what fraction truly has disease?" → prevalence-dependent
— Same test, same cutoff, different populations → same Sn/Sp, very different PPV/NPV
— Sp: among healthy, fraction correctly negative — prevalence-independent
— NPV: among test-negatives, fraction truly healthy — prevalence-dependent
— In low-prevalence settings, NPV is reassuringly high even for modest-Sn tests, simply because most people don't have disease
— Accuracy = (TP + TN)/total — heavily prevalence-influenced; misleading in imbalanced populations
— A test that calls everyone negative has 99% accuracy in 1%-prevalence disease — but 0% Sn
— LRs combine Sn and Sp into a single number per test result
— LR is cutoff-specific but prevalence-independent
— More useful than raw Sn/Sp because LR directly updates pretest → posttest odds
— Discrimination = can the test/model separate diseased from non-diseased? (Sn/Sp/AUC territory)
— Calibration = do predicted probabilities match observed? (e.g., does a "10% ASCVD risk" group truly have 10% events?)
— A model can discriminate well but be miscalibrated, requiring recalibration rather than rebuilding
— Reliability = reproducibility (same result on repeat); validity = correctness (matches truth)
— A test with a poorly chosen cutoff can be highly reliable yet invalid for the clinical question

— Sn/Sp estimated in tertiary referral population (severe, classic disease) overestimates performance in primary care (mild, atypical disease)
— Example: clinical decision rules validated in EDs may underperform in clinics
— Only test-positives get gold-standard confirmation → FN undercounted → Sn falsely inflated
— Mitigated by randomly verifying a sample of test-negatives
— Gold standard incorporates the index test → artificial agreement → inflated Sn/Sp
— Example: a "clinical diagnosis" of HF that already includes BNP being used to evaluate BNP
— Earlier detection moves the diagnosis date earlier without prolonging life → apparent survival ↑ without true mortality benefit
— Always evaluate screening by disease-specific mortality, not 5-year survival
— Screening preferentially detects slow-growing (indolent) disease → apparent prognosis improvement without true benefit
— Drives overdiagnosis of indolent prostate, thyroid, breast cancers
— Patients who attend screening are healthier overall → screened group looks better regardless of test efficacy
— In observational studies of cutoffs ("patients with troponin >X had worse outcomes"), severity drives both the high value and the outcome — not the cutoff itself
— Random misclassification → bias toward null; differential misclassification → unpredictable direction
— Patients selected for "abnormal" values near the cutoff tend to return toward normal on retesting — falsely attributing improvement to intervention

— HbA1c: <7% general goal, <6.5% if achievable without hypoglycemia, <8% in elderly/multimorbid — each cutoff trades microvascular benefit vs hypoglycemia risk
— LDL: post-ACS or high-risk → <70 mg/dL or ≥50% reduction; very high risk → <55 mg/dL — lower cutoffs catch more residual risk but require more therapy/monitoring
— BP: <130/80 in most adults; <140/90 in frail elderly per SPRINT-informed individualization
— INR for AF: 2.0–3.0; mechanical mitral valve 2.5–3.5 — therapeutic windows are paired cutoffs
— CD4 count: monitoring intervals stratified by <200, 200–500, >500 (historical; now viral load drives ART monitoring)
— PSA velocity, CEA trend in CRC, CA-125 in ovarian, AFP in HCC — change across cutoff is more meaningful than single value
— Surveillance imaging intervals (CT q3–6mo) are cutoff decisions trading recurrence detection vs cumulative radiation
— Colon: 45–75y, q10y colonoscopy or q1y FIT — interval length is a cutoff in time
— Breast: 40 (shared decision) or 50–74y q2y
— Cervical: 21–65 with cytology/HPV co-test intervals
— Lung (LDCT): 50–80y, ≥20 pack-years, current/quit <15y — eligibility cutoffs trade Sn for population yield
— Life-expectancy-based stopping of screening (mammography, colonoscopy >75 with limited life expectancy)
— A cutoff for when to stop testing is as important as when to start

— Explain that positive ≠ disease, especially in low-prevalence screening — most positives in screening contexts are false
— Use natural frequencies rather than percentages: "Out of 1000 people like you who test positive, about 200 truly have the disease" — patients understand this better than "PPV is 20%"
— Confirm with specific test before treatment, disclosure, or major decisions
— Negative does not mean zero risk — NPV is rarely 100%
— In high-pretest-probability patients, a negative sensitive test still leaves meaningful residual risk → continue evaluation
— For screening, advise return to standard interval, not abandonment
— Repeat in interval matched to disease biology (troponin q3h, A1c q3mo, PSA q6–12mo, BP 1–4 weeks for new hypertension)
— Trend interpretation almost always trumps single-value interpretation
— PSA 55–69, low-dose CT lung in marginal eligibility, mammography 40–49
— Document discussion of benefits (mortality reduction) and harms (false positives, overdiagnosis, procedure complications)
— Avoid "your test was positive" without context — patients hear "I have cancer"
— Provide numeric and visual representations (pictographs, 100-person diagrams)
— Record pretest probability assessment, choice of test/cutoff, interpretation, and follow-up plan
— Especially important when not ordering a test or when accepting a borderline result
— Positive screen → ensure timely confirmatory testing (closed-loop follow-up) — failure to follow up abnormal results is a major patient safety event and a top malpractice driver

— Patients have the right to understand that screening tests have false positives and false negatives, that positive results may trigger invasive workup, and that overdiagnosis is a real harm
— Particularly relevant for PSA, low-dose CT lung, BRCA testing, prenatal screening (cell-free DNA has imperfect PPV especially for sex chromosome aneuploidies)
— Pretest counseling for genetic testing is a Step 3 favorite — must include implications for family, insurance (GINA protections but not for life/disability insurance), and psychological impact
— Reportable infectious diseases (HIV, TB, syphilis, gonorrhea, hepatitis) — a "positive" result triggers public health reporting regardless of patient preference
— Lead levels above action threshold trigger environmental investigation
— These are non-negotiable; patients must be informed at time of testing
— Pending test results at discharge are a major safety gap — up to 40% of inpatients leave with pending labs/imaging
— Discharge summary must include pending results and explicit follow-up plan; PCP must close the loop
— Joint Commission considers communication of critical/abnormal results a National Patient Safety Goal
— Leading source of outpatient malpractice claims (missed cancer diagnosis)
— Mitigation: closed-loop result management, registry-based recall, EHR alerts, patient portal disclosure with provider review
— Ethical to inform patient that subsequent confirmatory testing was negative and that prior treatment may have been unnecessary (rare but real for indolent cancers)
— Cutoffs derived from non-representative populations can systematically misclassify minority groups (historical race-based eGFR coefficient; pulse oximetry inaccuracy in dark skin reducing Sn for hypoxemia in Black patients) — equity-aware recalibration is now an active area


— Stem: "Investigators lowered the diagnostic D-dimer threshold from 500 to 250 ng/mL. Which parameter will increase?"
— Answer logic: lower cutoff → more positives → Sn ↑, Sp ↓, FP ↑, FN ↓. Correct answer: sensitivity (or false-positive rate)
— Stem: "Test moves from referral center (prevalence 40%) to community screening (prevalence 2%). Sn and Sp are unchanged. What changes?"
— Answer: PPV falls, NPV rises. Trap answer: "Sn falls" — wrong, Sn is prevalence-independent
— Given raw counts, compute Sn, Sp, PPV, NPV, LR+, LR−
— Trick: ensure you're dividing by the correct denominator (disease column for Sn/Sp; test row for PPV/NPV)
— Stem: "Asymptomatic patient with positive 4th-gen HIV screen. Next step?"
— Answer: HIV-1/2 differentiation immunoassay (specific confirmation). Trap: starting ART before confirmation
— Stem: "High-pretest-probability PE patient with negative D-dimer. Next step?"
— Answer: CTPA — D-dimer should not have been ordered; negative does not overcome high prior
— Two curves shown; choose the better screening or confirmatory test based on where curves operate
— Stem describes screening study with apparent benefit → identify lead-time, length-time, or selection bias
— Elderly with D-dimer 600, BNP 400, or troponin baseline elevated in CKD → recognize population-specific cutoffs
— USPSTF age/risk cutoffs for colon, breast, lung, cervical, AAA, osteoporosis — memorize starting ages and intervals
— Closed-loop follow-up, shared decision-making, patient navigation — answer is rarely "do nothing" or "wait and see"

— SnNout, SpPin: a sensitive test with a negative result rules out; a specific test with a positive result rules in. Sequence sensitive → specific (HIV ELISA → confirmatory; ANA → anti-dsDNA; D-dimer → CTPA) to harness both properties on the same patient
— Cutoff lives on one ROC curve: moving the threshold trades Sn for Sp on the same test; only changing the test moves you to a different ROC curve. AUC summarizes overall discrimination (independent of cutoff); Youden index identifies the balanced point, but clinical context (cost of FP vs FN) determines the actual operating cutoff
— Prevalence is the silent variable: Sn and Sp are prevalence-independent, but PPV and NPV are not. In low-prevalence screening, even a high-Sp test yields many false positives — drives the need for confirmatory testing and shared decision-making for PSA, low-dose CT lung, BRCA, mammography in 40s
— Population-specific cutoffs are not optional: age-adjusted D-dimer, sex-specific hs-troponin, trimester-specific TSH, age-specific BNP, pediatric vital signs, and life-expectancy-based screening stop ages are all Step 3 favorites — applying the adult/general cutoff to these populations is a testable error

