Biostatistics & Population Health

Epidemiologic measures: incidence, prevalence, mortality

Clinical Overview and When to Suspect Misuse of Epidemiologic Measures

— Incidence = new cases over a defined time period ÷ population at risk (dynamic, forward-looking)

— Prevalence = total existing cases at a point (or over an interval) ÷ total population (static snapshot)

— Mortality = deaths ÷ population over time (a specific kind of incidence, where the "event" is death)

Why this matters on Step 3: Every board exam question that gives you a 2×2 table, a screening scenario, a disease registry, or a public health vignette is testing whether you can pick the right denominator and the right time frame. Confusing incidence with prevalence is one of the most common biostatistics errors on the USMLE.

Core trio:

When to suspect the question is testing incidence: stem mentions "new diagnoses per year," "developed disease during follow-up," cohort studies, outbreak investigations, vaccine efficacy, hazard ratios.

When to suspect prevalence: stem mentions "currently have," "burden of disease," cross-sectional survey, NHANES data, "screening clinic on a given day," disease registry snapshot.

When to suspect mortality: stem mentions "case fatality," "deaths per 100,000," "5-year survival," cancer registry endpoints, SEER data.

Step 3 management: When asked which measure best guides health policy resource allocation (e.g., how many dialysis chairs a city needs), prevalence wins. When asked which guides etiologic research (e.g., what causes the disease), incidence wins. When asked which guides prognosis after diagnosis, mortality/case-fatality wins.

Board pearl: Prevalence ≈ Incidence × Average disease duration (when prevalence is low and steady-state). This single equation explains why chronic, non-fatal diseases (HTN, DM2) have high prevalence but modest incidence, while rapidly fatal diseases (pancreatic cancer, rabies) have low prevalence despite meaningful incidence — patients die before accumulating in the prevalent pool.

Presentation Patterns and Key History — How Vignettes Frame the Numbers

— "10,000 healthy adults followed for 5 years; 200 developed colon cancer"

— Calculate: cumulative incidence = 200/10,000 = 2% over 5 years

— Or incidence rate = 200 / (person-years of follow-up), often expressed per 1,000 person-years

— "On January 1, 2024, a survey of 5,000 adults in County X found 400 with diabetes"

— Point prevalence = 400/5,000 = 8%

— Period prevalence adds anyone who had the disease at any time during the period

— "Of 500 patients diagnosed with pancreatic cancer, 450 died within 1 year"

— Case fatality rate = 450/500 = 90% (denominator = diseased, not total population)

— Mortality rate = deaths ÷ total population at risk (different denominator!)

— Case fatality = how deadly the disease is once you have it (denominator: cases)

— Mortality rate = how much the disease burdens the whole population (denominator: everyone)

— Ebola has high case fatality (~50%) but low US mortality rate (rare). Coronary disease has modest case fatality but enormous US mortality rate (common).

Pattern 1 — The cohort follow-up vignette (incidence):

Pattern 2 — The cross-sectional snapshot (prevalence):

Pattern 3 — The mortality/survival stem:

Key distinction: Case fatality vs mortality rate is a favorite trick.

Pattern 4 — The screening/lead-time scenario: Earlier detection inflates apparent survival without changing mortality — a key trap.

Board pearl: If the stem gives you a number and asks "what proportion of people in the community currently have…" → prevalence. If it gives you a number and asks "what proportion developed…" → incidence. The verbs "have/has" vs "develop/developed" are the linguistic tell — circle them on every biostats stem.

Physical Exam Findings — The 2×2 Table as Your Diagnostic Tool

```

Disease+ Disease–

Exposed/Test+ A B

Unexposed/Test– C D

```

— Incidence in exposed = A / (A+B)

— Incidence in unexposed = C / (C+D)

— Relative risk (RR) = [A/(A+B)] ÷ [C/(C+D)] — cohort studies

— Odds ratio (OR) = (A×D)/(B×C) — case-control studies, approximates RR when disease is rare

— Attributable risk (AR) = incidence(exposed) – incidence(unexposed)

— Prevalence = (A+C) / (A+B+C+D) at a point in time

— Incidence = "heart rate" — rate of new events

— Prevalence = "blood volume" — total burden currently circulating

— Mortality = "exit rate" — how fast the pool drains via death

— Cure/recovery = the other exit from prevalence

In epidemiology, the "physical exam" is the 2×2 contingency table. Master its layout once and questions become mechanical.

Measures derived from the table:

Hemodynamic analogy (the "vital signs" of an epidemiologic dataset):

Key distinction: RR is for cohort/RCT designs (you know who's exposed, you follow forward, you can compute true incidences). OR is for case-control designs (you start with disease status and look backward; you cannot compute incidence). Picking the wrong measure for the study design is a classic distractor.

Step 3 management: When a vignette gives raw counts, immediately sketch the 2×2 on scratch paper. Label rows/columns before plugging numbers. Errors come from flipped axes, not arithmetic.

Board pearl: Number needed to treat (NNT) = 1 / absolute risk reduction = 1 / AR. If a drug drops MI incidence from 10% to 6%, NNT = 1/0.04 = 25. NNT lives in the incidence world, never the prevalence world.

Diagnostic Workup — Calculating Incidence Precisely

— Cumulative incidence (incidence proportion, "risk"):

— Formula: new cases ÷ population at risk at start of period

— Unitless proportion (0–1 or %)

— Assumes everyone followed the full period — bad when there's heavy loss to follow-up or competing risks

— Example: 50 new MIs among 1,000 men followed 10 years → 5% cumulative incidence

— Incidence rate (incidence density):

— Formula: new cases ÷ total person-time at risk

— Units: cases per person-year (or per 1,000 person-years)

— Handles variable follow-up, dropouts, late entries

— Example: 50 MIs in 8,500 person-years → 5.88 per 1,000 person-years

— Stop the clock when a person develops the outcome, dies, or is lost to follow-up

— Don't double-count: a person can contribute time to the "at risk" denominator only while they remain at risk and disease-free

— Attack rate = incidence during an outbreak (often used for foodborne illness, infectious disease investigations); denominator is the exposed group

— Secondary attack rate = new cases among contacts of a primary case ÷ susceptible contacts; gauges transmissibility

Two flavors of incidence — know which the question wants:

Person-time calculation pitfalls:

Special incidence measures:

Key distinction: Prevalence is not an incidence. You cannot compute incidence from a single cross-sectional survey, no matter how cleverly the question is worded — you need a time dimension and a disease-free starting cohort.

Step 3 management: In outbreak vignettes (public health rotation flavor), the attack rate by exposure (e.g., who ate the potato salad vs who didn't) is the key calculation. The exposure with the highest attack rate ratio is your culprit food.

Board pearl: Vaccine efficacy = (incidence in unvaccinated – incidence in vaccinated) ÷ incidence in unvaccinated. It's just an attributable-risk-percent in disguise.

Diagnostic Workup — Advanced: Prevalence in Depth

— Point prevalence: existing cases at one moment ÷ population at that moment

— Best for chronic, stable conditions (HTN, DM, schizophrenia)

— Period prevalence: cases existing at any point during an interval ÷ average population

— Useful for episodic conditions (migraine, asthma exacerbations, major depressive episodes over the past year)

— Lifetime prevalence: proportion who have ever had the condition (depression, substance use disorders) — never decreases over a person's lifetime

— Steady-state approximation: P / (1–P) ≈ Incidence rate × Average duration

— When P is small (<10%): P ≈ I × D

— Implication: anything that prolongs disease duration (better treatment that doesn't cure — e.g., HIV ART, heart failure GDMT) raises prevalence even as incidence is stable or falling

— Improved survival from cancer → ↑ prevalence even if ↓ mortality and stable incidence

— Earlier diagnosis (screening) → ↑ prevalence via lengthened apparent duration

— A cure → ↓ prevalence (people exit the pool via recovery)

— A more lethal disease variant → ↓ prevalence (faster exit via death)

— PPV = (sens × prev) / [(sens × prev) + (1–spec)(1–prev)]

— High prevalence → high PPV; low prevalence → low PPV regardless of test quality

— This is why screening rare diseases yields many false positives even with "good" tests

Two flavors of prevalence:

The Prevalence–Incidence relationship:

Why this trips students up:

Prevalence and predictive values:

Key distinction: Sensitivity and specificity are test properties and do not change with prevalence. PPV and NPV do change with prevalence — a favorite Step 3 distractor.

Board pearl: If a question says "the prevalence of disease in this clinic is higher than in the general population, so..." — the answer almost always involves PPV rising (e.g., specialty referral clinics have higher pretest probability and therefore higher PPV for the same test).

Risk Stratification — Choosing the Right Measure for the Right Question

— "How common is this disease right now?" → Prevalence

— "What's the risk of getting it?" → Cumulative incidence

— "What's the rate of new cases per person-time?" → Incidence rate

— "How deadly is it once you have it?" → Case fatality

— "How much does this disease kill the population?" → Mortality rate

— "Did the exposure cause disease?" → RR or OR

— "How much disease would disappear if we removed the exposure?" → Attributable risk / Population attributable risk

— Crude rate = total events / total population (no adjustment)

— Age-adjusted (standardized) rate = what the rate would be if the population had a standard age distribution

— Used to compare populations with different demographics (e.g., Florida vs Alaska cancer rates)

— Direct standardization: apply observed age-specific rates to a standard population

— Indirect standardization: apply standard age-specific rates to your population → yields Standardized Mortality Ratio (SMR) = observed deaths ÷ expected deaths

— SMR = 1.0 → same mortality as reference

— SMR > 1.0 → excess mortality (e.g., occupational cohort vs general population)

— SMR < 1.0 → "healthy worker effect" common in employed cohorts

Decision framework — match measure to clinical/policy question:

Adjusted vs crude rates:

SMR interpretation:

Step 3 management: When comparing disease rates between two states/countries, always use age-adjusted rates. Crude rates mislead because populations differ in age structure — Florida's higher crude cancer mortality reflects an older population, not a more carcinogenic environment.

Key distinction: Relative risk captures strength of association (causal inference). Attributable risk captures public health impact (how much disease you'd prevent). A weak RR on a highly prevalent exposure (e.g., low-dose air pollution) can have huge AR — basis for population-level interventions.

Pharmacotherapy Analog — Measures of Association as Your "Drug Toolbox"

— RR = incidence(exposed) / incidence(unexposed)

— RR = 1: no association; >1: harmful exposure; <1: protective

— Confidence interval crossing 1 = not statistically significant

— OR = (A×D)/(B×C)

— Approximates RR when disease is rare (<10%)

— Overestimates RR when disease is common — a Step 3 trap

— Instantaneous risk ratio over time; interpretation similar to RR

— ARR = incidence(control) – incidence(treated)

— Drives NNT = 1/ARR

— RRR = (incidence_control – incidence_treated) / incidence_control = 1 – RR

— Drug ads love RRR because it sounds bigger ("50% reduction!") even when ARR is tiny (2% → 1%)

— AR = incidence(exposed) – incidence(unexposed)

— AR% = AR / incidence(exposed) = (RR–1)/RR

— Tells you the proportion of disease in the exposed attributable to exposure

— PAR = incidence(total population) – incidence(unexposed)

— Reflects what fraction of all disease in the population is due to the exposure — drives smoking cessation, BP control campaigns

Just as you choose drugs by mechanism, you choose epidemiologic measures by study design:

Relative Risk (RR) — cohort studies, RCTs:

Odds Ratio (OR) — case-control studies, cross-sectional:

Hazard Ratio (HR) — survival analysis, Kaplan-Meier/Cox regression:

Absolute Risk Reduction (ARR):

Relative Risk Reduction (RRR):

Attributable Risk (AR) and Attributable Risk Percent (AR%):

Population Attributable Risk (PAR) and PAR%:

Board pearl: Smoking and lung cancer: RR ≈ 20 (huge strength), AR% ≈ 95% (almost all lung cancer in smokers is caused by smoking), PAR% ≈ 80–90% (most population lung cancer would vanish without smoking). All three numbers tell different stories — and Step 3 will ask you which one to cite for which audience.

Advanced Measures — Mortality Metrics and Survival Curves

— e.g., US cardiovascular mortality ~ 220 per 100,000/year

— Disease-severity metric, not population-burden metric

— Does NOT measure risk; useful for descriptive epidemiology only

— Beware: a cause can have high proportionate mortality just because other causes are rare

— Highlights diseases killing young people (suicide, MVCs, overdose) even when total death counts are modest

— Global Burden of Disease framework

— Kaplan-Meier curve: step function showing cumulative survival over time

— Median survival = time at which 50% of cohort has died (read off K-M curve)

— 5-year survival = % alive at 5 years (NOT cure rate; not the same as case fatality)

— Log-rank test: compares two survival curves

— Cox proportional hazards model: generates HR while adjusting for covariates

Crude mortality rate: total deaths ÷ total population per year (often per 100,000)

Cause-specific mortality: deaths from a specific cause ÷ total population

Case fatality rate (CFR): deaths from disease ÷ people with disease

Proportionate mortality: deaths from a cause ÷ all deaths

Years of Potential Life Lost (YPLL): sums (reference age – age at death) for premature deaths

Disability-Adjusted Life Years (DALYs): YLL (years of life lost) + YLD (years lived with disability)

Survival analysis essentials:

Lead-time and length-time biases inflate apparent survival without changing true mortality — critical for screening trials. Only disease-specific mortality in an RCT proves a screening test works (e.g., low-dose CT for lung cancer in NLST).

Key distinction: 5-year survival vs mortality rate — improved 5-year survival can reflect earlier detection (lead time) rather than longer life. Step 3 favorite: a screening program that boosts 5-year survival but leaves mortality unchanged is not effective.

CCS pearl: When reading a Kaplan-Meier figure on the exam, look for separation that persists over time and a statistically significant log-rank p-value (<0.05) before declaring an intervention effective.

Special Populations — Elderly and Comorbidity-Heavy Cohorts

— Higher baseline incidence of nearly all chronic diseases (cancer, HF, dementia, CKD)

— Competing risks dominate: an 85-year-old "at risk" for colon cancer death is more likely to die first of cardiovascular disease, distorting cumulative incidence calculations

— Prevalence balloons because chronic disease duration is long and accumulates with age

— Standard Kaplan-Meier overestimates cumulative incidence when competing causes of death are common

— Cumulative incidence function (CIF) or Fine-Gray subdistribution hazard model handles competing risks properly

— Step 3 won't ask the math, but will ask which patients need adjusted analyses — answer: elderly, multimorbid, oncology cohorts

— Always report age-specific rates when describing geriatric disease patterns

— Age-adjusted rates can hide the steep age gradient

— Elderly research participants are healthier than age peers (selection bias)

— Inflates apparent benefits of interventions in observational studies; less of an issue in RCTs

— Often excluded from RCTs → external validity (generalizability) is limited

— When extrapolating RCT incidence/mortality data to CKD or cirrhotic patients, expect higher event rates and more adverse drug effects than the trial showed

Why epidemiologic measures behave differently in elderly populations:

Competing risk analysis:

Age-specific vs age-adjusted rates in geriatrics:

Healthy survivor / healthy worker bias:

Renal/hepatic impairment cohorts:

Step 3 management: When a vignette asks about applying USPSTF screening recommendations to an 80-year-old with multiple comorbidities, the right move is often to stop screening — because remaining life expectancy is shorter than the lead time needed for screening benefit to materialize (typically 10 years for colon and breast cancer screening).

Board pearl: A screening test's mortality benefit depends on the patient living long enough to realize it. Life expectancy <10 years → benefit unlikely, regardless of the test's published sensitivity/specificity.

Special Populations — Pregnancy, Pediatrics, and Vulnerable Groups

— Maternal mortality ratio: maternal deaths during pregnancy or within 42 days postpartum ÷ 100,000 live births (US: ~22 per 100,000 — high for a developed country)

— Pregnancy-related mortality: extends to 1 year postpartum

— Infant mortality rate (IMR): deaths <1 year of age ÷ 1,000 live births (US: ~5.4)

— Neonatal mortality: deaths <28 days ÷ 1,000 live births

— Perinatal mortality: stillbirths ≥20–28 weeks + early neonatal deaths ÷ 1,000 total births (births + stillbirths)

— Stillbirth (fetal death) rate: fetal deaths ≥20 weeks ÷ 1,000 total births

— Maternal mortality uses live births (not pregnancies) — a convention, not a perfect denominator

— Perinatal mortality uniquely includes stillbirths in numerator AND denominator — different from infant mortality

— Under-5 mortality rate: deaths before age 5 ÷ 1,000 live births — key global health metric

— Childhood incidence of vaccine-preventable disease is a sentinel surveillance metric

— Health disparities are described as differences in incidence, prevalence, or mortality between groups

— Black maternal mortality in the US is ~3× white maternal mortality — a Step 3 ethics/health systems favorite

Maternal-fetal epidemiology terminology — high-yield definitions:

Why denominators matter here:

Pediatric-specific measures:

Vulnerable populations and equity measures:

Key distinction: Infant mortality (IMR) denominator is live births; under-5 mortality denominator is also live births; neonatal is a subset of infant. Perinatal mortality is the oddball whose denominator includes stillbirths.

Step 3 management: When a question stem cites racial/ethnic disparities in maternal mortality or infant mortality, the correct intervention typically involves structural solutions — expanded prenatal access, implicit bias training, postpartum Medicaid extension — rather than individual-level counseling alone.

Board pearl: Low birth weight (<2500 g) is the single strongest predictor of infant mortality and is itself a key epidemiologic surveillance metric.

Complications and Adverse Outcomes — Biases that Corrupt These Measures

— Berkson bias: hospital-based case-control studies overestimate exposure-disease associations because hospitalization itself is influenced by both

— Healthy worker effect: occupational cohorts show falsely low mortality (SMR <1) because sick people leave the workforce

— Non-response bias: survey-based prevalence skewed if responders differ from non-responders

— Recall bias (case-control studies): cases remember exposures better than controls — inflates OR

— Surveillance/detection bias: more testing → higher apparent incidence (classic example: thyroid cancer "epidemic" driven by neck ultrasound)

— Misclassification: non-differential biases toward null; differential biases either direction

Selection bias — distorts both incidence and prevalence:

Information/measurement bias:

Lead-time bias: screening detects disease earlier; survival appears longer without true mortality benefit

Length-time bias: screening preferentially catches slow-growing (more indolent) cases; prognosis of screen-detected cases looks better than symptom-detected

Overdiagnosis: detection of disease that would never have caused symptoms (DCIS, indolent prostate cancer) — inflates incidence and prevalence, doesn't reduce mortality

Confounding: a third variable associated with both exposure and outcome — handled by randomization (RCTs), restriction, matching, stratification, or multivariable regression

Effect modification: stratified estimates of association differ; not a bias but a real biologic phenomenon to be reported, not adjusted away

Immortal time bias: in cohort studies, follow-up time during which the outcome cannot occur is incorrectly attributed to a treatment group — falsely lowers incidence in treated

Key distinction: Confounding is fixable with analytic methods. Selection bias generally is not — it must be prevented at the design stage.

Board pearl: A study claiming a screening test improves "5-year survival" without also showing decreased disease-specific mortality is almost certainly contaminated by lead-time and length-time bias.

When to Escalate — Public Health Reporting and Outbreak Thresholds

— Endemic: baseline level of disease consistently present in a population

— Epidemic/outbreak: incidence exceeds expected baseline in a defined population/time

— Pandemic: epidemic spanning multiple countries/continents

— Cluster: aggregation of cases in time/space; may or may not exceed baseline

— Confirm the diagnosis and verify the outbreak

— Define a case and case-find

— Describe by person, place, time (epi curve)

— Generate hypotheses → test with case-control or cohort design

— Implement control measures, communicate, follow up

— Notifiable to local/state health departments → CDC (NNDSS)

— Examples: TB, syphilis, gonorrhea, HIV, measles, pertussis, hepatitis A/B/C, Lyme, COVID-19, Zika, foodborne pathogens

— Failure to report can carry licensure consequences

— Average secondary cases generated per infected person in a fully susceptible population

— R₀ > 1 → outbreak grows; R₀ < 1 → outbreak dies out

— Herd immunity threshold ≈ 1 – (1/R₀)

— Measles R₀ ≈ 12–18 → ~95% vaccination needed for herd immunity

Surveillance terminology — when does an increase in incidence trigger action?

Outbreak investigation steps (CDC framework — Step 3 favorite):

Mandatory reportable diseases (US):

R₀ (basic reproduction number):

CCS pearl: If your CCS or vignette patient has a reportable disease, the first non-treatment action is to notify the local health department. This is also a patient-safety and legal duty, not just a public health nicety — and HIPAA explicitly permits this disclosure.

Step 3 management: When an epi curve shows a point-source outbreak (single sharp peak), look for a common exposure (catered event, water source). A propagated outbreak (multiple peaks ~ one incubation period apart) suggests person-to-person spread requiring isolation/contact tracing.

Board pearl: Notify public health while treating — these are parallel actions, not sequential.

Key Differentials — Confusing Incidence-Family Measures

— Cumulative incidence = proportion (unitless, fixed period)

— Incidence rate = events per person-time (has time in denominator)

— Use rate when follow-up varies; use cumulative incidence when everyone is followed the same period

— Both measure new cases, but attack rate is typically used in outbreaks over a defined short period and reported as a proportion

— Attack rate is essentially cumulative incidence for outbreak settings

— Crude: ignores age structure (misleading for comparisons)

— Age-specific: rate within an age stratum (e.g., 65–74 yrs)

— Age-adjusted: standardized to a reference population for fair comparison

— Mortality rate denominator = whole population

— Case fatality denominator = diseased people

— Proportionate mortality denominator = all deaths

— Same numerator (deaths from disease X), three different stories

— Lifetime risk = cumulative incidence over a lifespan (e.g., 1 in 8 women develop breast cancer in their lifetime)

— Both are observed/expected ratios from indirect standardization

— SIR uses incident cases; SMR uses deaths

— Conceptually similar; hazard is the instantaneous rate at time t, while incidence rate is averaged over an interval

Within the "rate of new events" family, distinguish:

Cumulative incidence vs incidence rate:

Incidence rate vs attack rate:

Crude incidence vs age-specific incidence vs age-adjusted incidence:

Mortality rate vs case fatality vs proportionate mortality:

Cumulative incidence vs lifetime risk:

Standardized Incidence Ratio (SIR) vs Standardized Mortality Ratio (SMR):

Hazard rate vs incidence rate:

Key distinction: When the question says "rate," look for time in the denominator. When it says "risk" or "proportion," it's a proportion bounded 0–1.

Board pearl: A "rate" with no time unit is almost always actually a proportion — many published "rates" (response rate, prevalence rate) are misnamed. Don't let the word "rate" fool you on the exam — check the denominator structure.

Key Differentials — Confusing Prevalence and Association Measures

— Point vs period prevalence: point = snapshot; period = anyone affected during the interval

— Prevalence vs incidence proportion: prevalence includes old cases; incidence counts only new cases during the period

— Prevalence vs cumulative incidence: a chronic disease's prevalence can exceed its cumulative incidence over the same period if old cases dominate the prevalent pool

— RR vs OR: RR for cohort/RCT; OR for case-control. When disease is common, OR exaggerates RR (overestimate of effect size)

— OR vs prevalence OR: cross-sectional studies yield prevalence ORs — these reflect existing disease, conflating incidence and duration

— RR vs HR: HR comes from time-to-event analysis; RR from fixed-period comparison

— AR vs RR: AR is absolute (subtraction), RR is relative (division)

— Students say "4× the risk" — incorrect if disease is common

— Correct statement: "4× the odds"; risk ratio is smaller when disease prevalence is high

— 1 – 5-year survival ≠ 5-year mortality unless follow-up is complete (no censoring)

— Better to report median survival and mortality rate separately

— PPV/NPV vary with prevalence

— Sensitivity/specificity do not

— Likelihood ratios are prevalence-independent ways to update pretest probability: LR+ = sens/(1–spec); LR– = (1–sens)/spec

Prevalence-family confusions:

Association-measure confusions:

A common trap — interpreting an OR of 4.0:

Survival vs mortality confusions:

Predictive value confusions:

Key distinction: When comparing two populations or two time periods, ask: are the denominators comparable? Different age structures → use age-adjusted rates. Different follow-up durations → use rates not proportions. Different disease definitions → results not comparable at all.

Board pearl: Pre-test probability ≈ prevalence in your patient's reference population. Likelihood ratios convert it to post-test probability via the Fagan nomogram — favorite tool for diagnostic reasoning questions.

Secondary Prevention — Using Epidemiology to Drive Population Interventions

— High incidence → primary prevention (vaccines, behavioral interventions reducing exposure)

— High prevalence → secondary prevention (screening to detect early, disease management to reduce complications)

— High case fatality → tertiary prevention (improving treatment, hospice/palliative care, rehab)

— Grade A: high certainty of substantial net benefit — recommend (e.g., colorectal cancer screening 45–75)

— Grade B: high certainty of moderate OR moderate certainty of substantial net benefit — recommend (e.g., AAA screening in men 65–75 who ever smoked)

— Grade C: small net benefit; offer selectively based on individual circumstances

— Grade D: no net benefit or harms outweigh — do NOT offer (e.g., PSA screening >70)

— Grade I: insufficient evidence

— Choose interventions for high PAR%, not just high RR

— Example: modest BP control RR ≈ 0.7 for stroke, but BP affects most adults → enormous PAR

— Like NNT but for screening — how many people must be screened to prevent one death

— Mammography NNS for breast cancer mortality ≈ 1,000–2,000 over 10 years (age-dependent)

— Quality measures (HEDIS): proportions of diabetics with HbA1c <8, BP control rates, screening completion rates — all prevalence-based denominators

— Value-based care payments often tied to risk-adjusted outcome rates (mortality, readmission) to avoid penalizing safety-net hospitals serving sicker populations

Translating epidemiologic measures into action:

USPSTF grading framework (Step 3 essential):

Key population attributable framework:

Number needed to screen (NNS):

Health system metrics linking epidemiology to care delivery:

Step 3 management: When asked to choose between two interventions in a population-health question, pick the one with the larger absolute risk reduction in a high-prevalence condition over the one with a larger RRR in a rare condition.

Board pearl: Risk adjustment exists because crude outcome rates penalize hospitals caring for sicker patients — a recurring health-systems Step 3 theme.

Follow-Up and Monitoring — Surveillance Systems and Data Sources

— Vital statistics (NCHS): birth and death certificates → mortality, IMR, life expectancy

— NHANES: National Health and Nutrition Examination Survey — prevalence of conditions (HTN, DM, obesity) with exam/lab data

— BRFSS: Behavioral Risk Factor Surveillance System — telephone survey of risk behaviors and self-reported prevalence

— SEER: Surveillance, Epidemiology, and End Results — cancer incidence, prevalence, survival

— NNDSS: National Notifiable Disease Surveillance System — reportable infectious diseases

— MMWR: Morbidity and Mortality Weekly Report — CDC's surveillance bulletin

— Death certificates: complete but with cause-of-death misclassification

— Surveys: self-report bias, non-response bias

— Registries: high-quality but disease-specific

— Rising incidence + stable mortality → improved detection (e.g., thyroid microcarcinomas)

— Stable incidence + falling mortality → better treatment (e.g., HIV after ART)

— Rising incidence + rising mortality → true increase in disease burden

— Falling incidence + falling mortality → successful primary prevention (e.g., gastric cancer in US)

— Period life expectancy: current cross-sectional age-specific mortality applied to a hypothetical cohort

— Sensitive to deaths at young ages (1 infant death "costs" ~80 life-years)

— US life expectancy fell during the opioid epidemic and COVID-19 — both surveillance findings tied to specific causes

— Communicate absolute risks, not just relative risks

— Use natural frequencies ("3 out of 1,000") rather than probabilities (0.3%) — better comprehension

Major US epidemiologic data sources to recognize:

Strengths and limitations:

Monitoring epidemiologic trends — interpretation skills:

Life expectancy:

Counseling patients with epidemiologic data:

Step 3 management: When the question asks how a clinician should explain risk to a patient considering screening, choose answers framed in absolute numbers and natural frequencies with both benefits and harms (overdiagnosis, false positives).

Board pearl: A 2-percentage-point ARR sounds smaller than a 50% RRR — they can describe the same effect. Always present both for informed consent.

Ethical, Legal, and Patient Safety Considerations

— Reportable diseases override individual confidentiality under public health law — patients cannot opt out

— HIPAA explicitly permits disclosure to public health authorities (45 CFR 164.512)

— Tarasoff-type duties (threats to identified victims) extend to specific infectious disease contexts in some states (e.g., partner notification for HIV in jurisdictions with named-partner requirements)

— Epidemiologic research using identifiable data requires IRB review and usually informed consent

— De-identified surveillance data may be analyzed under HIPAA's public health exception without individual consent

— Genetic epidemiology raises unique consent issues — incidental findings, family implications, GINA protections against employment/insurance discrimination

— Reporting incidence/mortality by race, ethnicity, sex, and SES is a Step 3 patient safety priority

— Failure to disaggregate masks disparities (e.g., aggregate "Asian American" data obscures Filipino-American HTN burden)

— Hospital discharge to communities with poor primary care access → measurable readmission risk

— Step 3 vignettes test whether you arrange follow-up appropriate to local resources, not just the textbook ideal

— Public health authorities can compel isolation for certain communicable diseases (e.g., active untreated TB) under state law

— Must use least restrictive means; due process applies

— Patient safety: a physician who knowingly allows an infectious patient to be discharged to a congregate setting without notification creates both clinical and legal liability

Mandatory reporting and confidentiality balance:

Informed consent edge case — research participation:

Equity and disparities:

Transition-of-care risk from epidemiologic ignorance:

Quarantine and isolation:

Step 3 management: A patient with active pulmonary TB who refuses treatment and threatens to leave AMA → contact public health immediately; court-ordered isolation is legally permissible. This is the canonical USMLE "autonomy yields to public health" scenario.

Board pearl: Mandatory reporting laws protect physicians from breach-of-confidentiality liability when reporting in good faith — reporting is the safe, correct action.

High-Yield Associations and Rapid-Fire Clinical Facts

— Prevalence ≈ Incidence × Duration (rare disease, steady state)

— RR = [A/(A+B)] / [C/(C+D)]

— OR = (A×D) / (B×C)

— NNT = 1 / ARR

— AR = I_exposed – I_unexposed

— AR% = (RR–1)/RR

— PAR% drives population intervention priority

— Herd immunity threshold = 1 – (1/R₀)

— Sensitivity = TP/(TP+FN); Specificity = TN/(TN+FP)

— PPV = TP/(TP+FP); depends on prevalence

— LR+ = sens/(1–spec); LR– = (1–sens)/spec

— US life expectancy ~ 77 years (post-COVID)

— US infant mortality ~ 5.4 / 1,000 live births

— US maternal mortality ~ 22 / 100,000 live births (high for OECD)

— Leading cause of death US: heart disease, then cancer, then unintentional injuries

— Leading cause of cancer death: lung (both sexes)

— Leading cause of death age 1–44: unintentional injuries

— Colorectal: 45–75, multiple modalities

— Mammography: 40–74, biennial (2024 update)

— Cervical: Pap 21–29 q3y; Pap+HPV co-test 30–65 q5y

— Lung (LDCT): 50–80, ≥20 pack-years, current/quit within 15 years

— AAA: men 65–75 who ever smoked, one-time ultrasound

— Rare disease → case-control

— Rare exposure → cohort

— Causation proof → RCT

— Genetic/familial → twin or family studies

— Generating hypothesis → cross-sectional or ecologic

Memorize these equations cold:

Classic data points to recognize:

Screening high-yield (USPSTF):

Study design quick-pick:

Board pearl: Ecologic fallacy — using group-level data to infer individual-level associations is a common Step 3 distractor (e.g., countries with higher fat intake have higher CHD ≠ individuals with higher fat intake have higher CHD).

Key distinction: RCTs eliminate confounding through randomization but cannot fix selection bias if dropout is differential or enrollment is non-representative.

Board Question Stem Patterns

— Stem describes cohort followed over time with new cases

— Action: cumulative incidence = new cases / population at risk at start

— Watch for person-time wording → switch to incidence rate

— Cross-sectional snapshot wording: "currently have," "at the time of the survey"

— Action: existing cases / total population at that moment

— Burden of chronic disease in a community → prevalence

— Risk of developing disease → incidence

— How deadly once you have it → case fatality

— How exposure causes disease → RR or OR

— Public health priority → PAR or AR

— Rising survival without falling mortality → lead-time/length-time bias or overdiagnosis

— Rising prevalence with stable incidence → improved survival (longer duration)

— Falling mortality with stable incidence → improved treatment

— Higher crude mortality in State A vs B → age structure difference, use age-adjusted rates

— Calculate attack rates by exposure

— Compare attack rates → identify exposure

— Notify health department (always)

— Calculate sens/spec from 2×2

— PPV/NPV require prevalence — Bayesian thinking

— "Why does the same test perform differently?" → prevalence differs

— Different OR vs RR magnitudes → check disease frequency

— Different incidence rates → check person-time methodology

— Reportable disease → notify health department

— Active TB refuses treatment → court-ordered isolation permitted

— Patient privacy vs public health → public health usually wins for notifiable diseases

Pattern A — "What is the incidence?"

Pattern B — "What is the prevalence?"

Pattern C — "Which measure best describes…"

Pattern D — "What explains the discrepancy?"

Pattern E — Outbreak vignette:

Pattern F — Screening test in clinic:

Pattern G — Comparing two studies:

Pattern H — Ethics/public health:

Step 3 management: First, identify the denominator the question implies. Second, identify the time frame. Third, choose the measure. Most wrong answers fail at step one — mismatched denominators.

Board pearl: When stuck, sketch a 2×2 table immediately. It resolves ~70% of biostats stems mechanically.

One-Line Recap

Incidence counts new cases over time, prevalence counts existing cases at a moment, mortality counts deaths in a population — and choosing the right denominator and time frame is the entire game of clinical epidemiology.

— Incidence: new cases ÷ population at risk over time

— Prevalence: existing cases ÷ total population at a point

— Mortality: deaths ÷ total population over time (a special incidence)

— Case fatality: deaths ÷ diseased population (a special mortality)

— RCT/cohort → RR, ARR, NNT, incidence

— Case-control → OR (approximates RR if disease rare)

— Cross-sectional → prevalence, prevalence OR

— Time-to-event → HR, Kaplan-Meier, log-rank

— Lead-time and length-time bias inflate survival without changing mortality

— Confounding distorts associations (fix with randomization or regression)

— Selection bias must be prevented at design

— Ecologic fallacy: don't infer individuals from groups

Three core measures, three core denominators:

Prevalence ≈ Incidence × Duration — explains why chronic non-fatal diseases dominate the prevalent pool and why better treatment (longer duration) raises prevalence

Study design dictates the measure:

Beware the biases that corrupt these measures:

Step 3 management: Whenever a vignette gives raw counts, draw the 2×2, identify the denominator the question demands, pick the measure that matches the policy or clinical question being asked, and check whether bias or confounding could explain the result before accepting causation.

Board pearl: "Have/has" = prevalence. "Developed/incident" = incidence. "Died/fatality" = mortality. Circle the verb on every biostats stem — and the right answer usually selects itself.