Biostatistics & Population Health

Case-control study design and interpretation

Clinical Overview and When to Suspect Case-Control Design

— Direction: outcome → exposure (opposite of cohort)

— Timing: typically retrospective, though "nested" case-control can be drawn from a prospective cohort

— Rare outcomes (e.g., pancreatic cancer, hemorrhagic stroke in young women, vaccine-associated myocarditis) where a cohort would need tens of thousands of person-years

— Long latency between exposure and disease (DES and clear-cell vaginal adenocarcinoma — the classic teaching example)

— Outbreak investigation when a quick answer is needed (foodborne illness at a wedding)

— Multiple exposures, single outcome — efficient to interrogate many candidate risk factors at once

— Rare exposures (use a cohort instead — start with the exposed)

— When you need incidence or absolute risk (case-control cannot give these directly)

— When recall or selection of controls will be hopelessly biased

Board pearl: The single fastest case-control tip-off on USMLE: the stem counts cases first, then matches or samples controls, then asks about prior exposure. If you see that sequence, the answer choices "cohort," "cross-sectional," and "RCT" are distractors.

Definition: Observational study that starts by identifying people with the outcome (cases) and without the outcome (controls), then looks backward in time to compare exposure frequencies

When this design is the right tool:

When NOT to use it:

Core output: the odds ratio (OR), which approximates relative risk only when the disease is rare (<10% prevalence — the "rare disease assumption")

Step 3 framing: Boards love to give you a paragraph describing a study, then ask you to name the design and pick the correct measure of association. If the stem says "researchers identified 200 patients with X and 400 without X, then asked about prior exposure to Y" — that is case-control, and the answer measure is odds ratio, never relative risk or incidence

Presentation Patterns and Key History (How Stems Describe These Studies)

— "Investigators enrolled 150 women newly diagnosed with ovarian cancer and 300 age-matched women without cancer from the same clinic. Participants were interviewed about prior oral contraceptive use."

— "After an outbreak of gastroenteritis at a banquet, public health officials interviewed 40 ill attendees and 80 well attendees about foods consumed."

— "Using a tumor registry, researchers identified patients with mesothelioma and matched them to hospital controls, then reviewed occupational records for asbestos exposure."

— "Newly diagnosed cases were identified..."

— "Controls were selected from..."

— "Matched on age and sex..."

— "Prior exposure was ascertained by interview / chart review / questionnaire"

— "Odds of exposure" in the result line

— Hospital controls — convenient but may share risk factors with cases (Berkson bias)

— Population/community controls — more generalizable but harder recruitment and lower response

— Neighborhood or sibling controls — control for SES/genetics but may overmatch on the exposure

— Registry-based controls — efficient, used in nested designs

— Exposure dose, duration, timing relative to diagnosis

— Confounders: age, sex, smoking, comorbidities, medications

— Recall aids: photographs of pills, calendars, employment records — these are deployed specifically to reduce recall bias

Key distinction: A cross-sectional study measures exposure and outcome at the same time in a defined population — you can compute prevalence. A case-control study selects on outcome status first, so you can NEVER compute prevalence or incidence from it. If the stem gives you a "prevalence of disease" from the study sample, it is not a case-control.

Classic stem architecture:

Linguistic flags pointing to case-control:

Common control sources (and their bias profile):

History elements the study itself collects (mirrors what the stem will mention):

Structural Anatomy of a Case-Control Study (the "Physical Exam" of the Design)

```

Cases (D+) Controls (D−)

Exposed (E+) a b

Unexposed (E−) c d

```

— Columns are fixed by the investigator (you chose how many cases and controls to enroll)

— Rows are observed (how many in each group turn out to have been exposed)

— Incidence (a/(a+b)) — meaningless, since (a+b) is an artifact of sampling

— Relative risk directly

— Attributable risk

— Number needed to harm

— Odds of exposure among cases = a/c

— Odds of exposure among controls = b/d

— Odds ratio = (a/c) ÷ (b/d) = ad/bc ("cross-product ratio")

— Individual (1:1 or 1:k) matching on age, sex, calendar time — reduces confounding by the matched variable, but requires conditional logistic regression or McNemar-style analysis, not a plain chi-square

— Overmatching (matching on a variable in the causal pathway, or on the exposure itself) destroys the ability to detect a true association — a classic boards trap

— Power is driven by the number of cases and the case:control ratio

— Adding controls beyond 4:1 yields diminishing returns — this is why outbreak investigators commonly enroll 2–4 controls per case

Board pearl: If a stem describes matching on age and sex and then analyzes data with a standard unpaired chi-square, the analysis is wrong — matched data require paired analysis (McNemar) or conditional logistic regression. Step 3 has asked this directly.

The 2×2 table — memorize this orientation:

Why this matters: Because column totals are fixed, you cannot compute:

What you CAN compute:

Matching:

Sample size logic:

Diagnostic Workup — Calculating and Interpreting the Odds Ratio

— Formula: OR = ad/bc

— Interpretation: OR > 1 → exposure associated with increased odds of disease; OR < 1 → protective; OR = 1 → no association

— 40 cases of gastroenteritis, 80 well controls

— Ate potato salad: 32 cases, 20 controls

— Did not eat: 8 cases, 60 controls

— OR = (32 × 60) / (8 × 20) = 1920 / 160 = 12

— Interpretation: cases had 12-fold higher odds of having eaten the potato salad than controls

— A 95% CI that excludes 1.0 = statistically significant at α = 0.05

— Wider CI → smaller sample or rarer exposure → less precise

— Boards love to give an OR of 2.5 with CI 0.8–7.4 and ask if it is significant — answer: no, the CI crosses 1

— Always state the referent group explicitly: "odds of cancer among smokers vs nonsmokers, OR 3.2"

— Inverting the table inverts the OR (1/OR) — useful for converting "protective" to "risk" framing

— Crude OR — raw 2×2 result

— Adjusted OR (aOR) — from logistic regression, controls for confounders

— A meaningful change between crude and adjusted OR (>10%) signals confounding

Step 3 management of a stem: When asked "what is the measure of association," in a case-control study the answer is odds ratio. When asked "what does an OR of 4.0 mean," translate it as "the odds of exposure are 4 times higher in cases than controls," not "exposed people are 4 times more likely to get disease" (that is RR language).

The odds ratio (OR) — the only effect measure case-control gives you directly:

Worked example (outbreak):

Confidence intervals:

OR ≈ RR only when the disease is rare (<~10%). If a stem reports a case-control study of a common outcome and treats the OR as if it were a relative risk, that is a methodologic error

Direction of comparison:

Adjusted vs crude OR:

Diagnostic Workup — Bias Identification (the Confirmatory Studies)

— Cases (especially parents of sick children, cancer patients) search their memory more thoroughly for exposures than healthy controls

— Inflates OR away from the null

— Mitigation: structured questionnaires, blinded interviewers, biomarker-based exposure ascertainment, nested case-control using pre-collected specimens

— Berkson bias: using hospital controls who are hospitalized for a condition related to the exposure (e.g., COPD controls in a smoking–lung cancer study) — biases OR toward null

— Neyman (prevalence-incidence) bias: missing rapidly fatal cases who died before enrollment — underestimates exposure effect for severe disease

— Healthy worker effect: if controls are recruited from an actively employed population

— Nondifferential (equal in cases and controls) → biases toward null

— Differential (different by case status, e.g., recall bias) → biases in either direction

Board pearl: When a stem describes parents of children with leukemia recalling pesticide exposure more thoroughly than parents of healthy children → the answer is recall bias, and the proposed fix is blinded structured interviews or use of pre-existing records (a nested case-control). This exact stem has appeared on Step 3.

Recall bias — the signature flaw of case-control studies

Selection bias

Interviewer bias — non-blinded interviewers probe cases more thoroughly; mitigated by blinding interviewers to case/control status

Confounding — addressed at design (matching, restriction) or analysis (stratification, multivariable regression). A confounder must be: (1) associated with exposure, (2) an independent risk factor for outcome, (3) not on the causal pathway

Misclassification:

Reverse causation — particularly problematic when exposure was measured after disease onset (e.g., diet recall in cancer patients who have already changed eating habits)

Risk Stratification — Choosing Case-Control vs Alternative Designs

— Rare outcome + need to interrogate exposures → case-control

— Rare exposure + want to track outcomes → cohort (start with exposed and unexposed)

— Need causation, randomization feasible/ethical → RCT

— Snapshot of a population, prevalence question → cross-sectional

— Synthesize existing studies → meta-analysis / systematic review

— Built inside an existing prospective cohort

— Cases are cohort members who develop the outcome; controls are sampled from cohort members who have not (yet) developed it

— Huge advantage: exposure data and biospecimens were collected before outcome — eliminates recall bias and reverse causation

— Cost-efficient when expensive biomarker assays cannot be run on the whole cohort

— Used for transient exposures and acute outcomes (e.g., MI risk in the hour after heavy exertion, MVA risk after cellphone use)

— Controls for all fixed confounders (genetics, chronic habits)

Step 3 management: Given a paragraph describing a question, pick the most efficient valid design. Pancreatic cancer and a candidate gene → case-control. Effect of a new antihypertensive on stroke → RCT. Effect of caffeine in the hour before atrial fibrillation onset → case-crossover. Effect of obesity on incident diabetes in 50,000 nurses → cohort. The exam rewards matching design to question, not always picking the "highest level of evidence."

Decision tree the boards expect you to internalize:

Nested case-control — a hybrid:

Case-crossover — each case serves as their own control at a different time

Case-cohort — controls are a random sample of the entire baseline cohort, allowing estimation of risk ratios; useful when multiple outcomes will be studied

Pharmacotherapy of the Design — Analysis Tools and Adjustments

— Crude OR from the 2×2 table (ad/bc)

— Chi-square or Fisher exact test (when cell counts <5) for significance

— Logistic regression to adjust for multiple confounders; output is adjusted OR with 95% CI

— Build a table of discordant pairs:

— Pairs where case exposed, control not = b

— Pairs where control exposed, case not = c

— Matched OR = b/c

— Significance tested with McNemar test — concordant pairs (both exposed or both unexposed) carry no information

— For multiple matching variables or 1:k matching → conditional logistic regression

— Calculate stratum-specific ORs, then pool with weights

— If stratum-specific ORs are similar → confounding is controlled, pooled MH OR is valid

— If stratum-specific ORs differ markedly → effect modification is present; do not pool, report separately

— Confounder: distorts a true association; you adjust it away

— Effect modifier: the exposure-outcome relationship genuinely differs across subgroups; you report it, do not collapse it (e.g., OR of OCP and MI is much higher in smokers than nonsmokers — smoking modifies the effect)

Key distinction: Confounding is a nuisance to be controlled; effect modification is a finding to be presented. Boards test this by giving stratified ORs of 1.2 (nonsmokers) and 8.5 (smokers) — answer is effect modification, not confounding, and the wrong move is to report a single pooled OR.

Unmatched case-control analysis:

Matched case-control analysis:

Stratified analysis (Mantel-Haenszel):

Effect modification vs confounding (high-yield distinction):

Reporting standards: STROBE checklist is the case-control analog of CONSORT for RCTs

Procedures — Outbreak Investigation and Applied Case-Control Mechanics

— Step 1: Establish a case definition (clinical + person/place/time criteria)

— Step 2: Identify cases through active surveillance, line list construction

— Step 3: Enroll controls — typically attendees at the same event who did not become ill, 2–4 per case

— Step 4: Administer a structured questionnaire about each candidate exposure (every food, every activity)

— Step 5: Compute an OR for each exposure; the food with the highest OR and tightest CI excluding 1 is the likely vehicle

— Step 6: Confirm with environmental sampling, lab testing, traceback

— High attack rate among exposed + low among unexposed → suggestive vehicle

— But in a true case-control (not cohort) outbreak, attack rates are not directly available — you rely on OR of exposure

— Multiple comparisons: testing 30 foods inflates type I error; Bonferroni or focus on biologically plausible exposures

— Missing data / poor recall: anchor questions with a menu or photographs

— Asymptomatic cases / mild cases not captured → selection bias toward severe disease

CCS pearl: In a CCS-style stem describing a cluster of bloody diarrhea at a daycare, your orders include stool cultures (Shiga toxin / E. coli O157:H7), notify public health, isolate symptomatic children, and — at the population level — a case-control investigation comparing food/water/contact exposures between ill and well children to identify the source.

Outbreak case-control is a Step 3 staple. The CDC field manual flow:

Interpreting outbreak tables:

Practical pitfalls in the field:

Reporting to public health: outbreak investigation outputs feed mandatory reporting (foodborne illness, vaccine-preventable disease, healthcare-associated outbreaks) — Step 3 expects you to know that suspected foodborne outbreaks are reportable to the local health department, often within 24 hours

Special Populations — Elderly and Comorbid Subjects in Case-Control Studies

— If you sample prevalent cases of MI from a clinic, you systematically miss patients who died of their first MI — and those who died may have had the strongest exposure (e.g., heaviest smoking)

— Result: OR underestimates the true exposure effect

— Mitigation: enroll incident cases only (newly diagnosed), ideally from emergency department or registry capture

— Elderly cases on many drugs → multiple potential confounders and effect modifiers

— Indication bias / confounding by indication is particularly nasty: the drug is prescribed for a condition that itself causes the outcome (e.g., benzodiazepines and falls — the underlying anxiety/insomnia and frailty predict both)

— Adjustment requires careful covariate selection, propensity scoring, or active-comparator design

— Elderly subjects may die of other causes before the outcome of interest manifests — competing-risk bias

— Particularly relevant for slow-onset cancers and dementia

Board pearl: When a stem on a case-control study of dementia uses spouses of cases as informants but interviews controls directly, the asymmetry creates differential misclassification — bias direction can go either way and the study's OR is suspect. The fix is symmetric ascertainment (use proxies for both groups or neither).

Survival bias (Neyman bias) is amplified in older populations:

Polypharmacy as a confounder:

Frailty and competing risks:

Renal/hepatic impairment is rarely directly relevant to design, but it is a major confounder for many exposure-outcome questions (e.g., NSAID use and GI bleed — CKD modifies both exposure patterns and bleeding risk) and must be included in the regression model

Recall reliability: Cognitive impairment in elderly cases degrades exposure recall; proxy informants (spouse, adult child) introduce differential misclassification if used only for cases

Special Populations — Pregnancy, Pediatrics, and Genetic/Rare-Disease Subgroups

— RCTs of exposures in pregnancy are unethical

— Case-control of birth defects vs healthy controls, comparing maternal exposures (medications, infections), is the standard

— Classic example: thalidomide and phocomelia identified via case-series→case-control logic; DES and clear-cell vaginal adenocarcinoma identified via case-control

— Recall bias is severe: mothers of affected infants scrutinize pregnancy in detail — mitigated by prescription registry data or nested designs within pregnancy cohorts (e.g., MotherToBaby, Slone Birth Defects Study)

— Childhood cancers, congenital heart disease, rare metabolic disorders — case-control is often the only feasible design due to low incidence

— Parents are proxies → exposure ascertainment must be standardized

— Cases with the phenotype vs population controls, compared at millions of SNPs

— Population stratification is the confounder analog — ancestry differences between cases and controls produce false-positive associations; controlled by principal-components adjustment

— Significance threshold is genome-wide (p < 5 × 10⁻⁸) to account for multiple testing

— Case-control and self-controlled case series compare vaccinated vs unvaccinated among those with the adverse event (e.g., intussusception after rotavirus vaccine, narcolepsy after pandemic H1N1 vaccine)

Key distinction: A cohort of pregnancies with prospectively recorded exposures (e.g., Sweden's MFR registry) yields RR and is preferred when feasible. A case-control of birth defects is faster and is the right design when the defect is rare — but susceptible to recall bias, which the boards will flag.

Pregnancy and teratogen studies — the case-control workhorse:

Pediatric rare diseases:

Genetic case-control (GWAS):

Vaccine safety signal detection:

Complications — Common Errors That Invalidate a Case-Control Study

— Controls drawn from a population with different exposure opportunity than cases (e.g., hospital controls with smoking-related disease in a lung-cancer study)

— Controls from a different time period than cases (secular trends in exposure)

— Volunteer controls (healthier, wealthier, less exposed) → biases OR

— Matching on a variable caused by the exposure or on the causal pathway → drives OR toward 1

— Matching on too many variables → discards informative pairs, loses power

Board pearl: If a case-control study reports OR = 1.0 (no association) for a known true risk factor, suspect nondifferential misclassification of exposure or overmatching — both pull the OR toward the null. If a study reports an OR of 25 for a modest exposure, suspect recall bias or selection bias inflating the estimate.

Inappropriate control selection — the single biggest threat:

Overmatching:

Recall bias (covered) — magnified by exposures requiring detailed memory

Misclassification of outcome: weak case definition contaminates the case group with non-cases → bias toward null

Failure to adjust for confounders — crude OR reported without multivariable analysis

Multiple testing without correction: screening many exposures inflates type I error

Reverse causation: disease prodrome alters exposure (e.g., early Alzheimer's patients drink less coffee, generating a spurious "protective" association)

Publication bias at the literature level — positive case-control studies are published more readily, inflating meta-analytic estimates

Conflict of interest / sponsorship effects on questionnaire design and exposure ascertainment

Berkson bias when both exposure and outcome independently raise the probability of hospitalization

When to Escalate — Statistical Consultation and Higher-Order Designs

— Matched design with >2 controls per case → conditional logistic regression

— Time-varying exposures → case-crossover or self-controlled case series

— Multiple correlated outcomes → multivariate methods

— Suspected effect modification → formal interaction testing

— Missing data >10% on key variables → multiple imputation

— Mediation analysis (does the exposure act through an intermediate?) → counterfactual mediation methods

— Question shifts from association to causation → consider RCT if ethical

— Need absolute risk for shared decision-making → cohort

— Need real-time outbreak control → retrospective cohort of attendees (if denominator known) may be faster and more informative than case-control

— Single case-control underpowered → pooled / meta-analyzed case-control consortia (e.g., InterLymph for lymphoma, INHANCE for head & neck cancer)

— Even retrospective chart-review case-control studies require IRB review; most qualify for expedited review or waiver of consent if data are de-identified

— Use of biospecimens triggers stricter consent requirements and HIPAA considerations

Step 3 management: A resident designing a single-center case-control of a rare arrhythmia after a new drug should be guided to (1) pre-specify the analysis plan, (2) seek IRB approval with HIPAA waiver, (3) consult biostatistics for sample size, and (4) consider joining a multi-site consortium for adequate power.

Triggers to involve a biostatistician or epidemiologist (Step 3 systems thinking):

When to upgrade to a different design entirely:

When to combine studies:

IRB and regulatory escalation:

Public-health escalation: an outbreak case-control yielding a likely vehicle triggers product recall, regulatory notification (FDA, USDA, CDC), and press release — Step 3 expects you to know that the local health department is the first notification, not the FDA

Key Differentials — Distinguishing Case-Control From Other Observational Designs

— Case-control: selects on outcome, looks back at exposure, yields OR, efficient for rare outcomes, vulnerable to recall/selection bias

— Cohort: selects on exposure, follows forward to outcome, yields RR, incidence, attributable risk, efficient for rare exposures, vulnerable to loss to follow-up

— Cross-sectional measures prevalence and exposure simultaneously in a single defined population — yields prevalence ratio / prevalence OR, but cannot establish temporality

— Case-control has a clear case definition and a separate sampled control group

— Case series has no control group — purely descriptive, hypothesis-generating

— Case-control adds the comparator and so can test associations

— Ecological data are group-level (country-level rates of fat intake vs breast cancer mortality) — vulnerable to ecological fallacy (group-level association ≠ individual-level)

— Case-control is individual-level data

— Case-crossover uses the same individual as their own control at a different time window — eliminates between-person confounding; ideal for transient exposures and acute outcomes

— Nested is drawn from a defined cohort with pre-collected baseline data → much less recall and selection bias

— Traditional recruits cases and controls de novo

Board pearl: A stem describing pre-collected blood samples from a cohort, with biomarker assays run only on cases plus a sampled subset of cohort members → this is a nested case-control, and its main advantage over traditional case-control is temporal ordering of exposure measurement before outcome, eliminating recall bias.

Case-control vs cohort — the most-tested distinction:

Case-control vs cross-sectional:

Case-control vs case series:

Case-control vs ecological study:

Case-control vs case-crossover:

Nested case-control vs traditional case-control:

Key Differentials — Effect Measures and Statistical Cousins

— RR = (a/(a+b)) / (c/(c+d)) — requires known denominators

— OR = ad/bc — does not require denominators

— OR > RR when disease is common; OR ≈ RR when disease is rare (<10%)

— Case-control yields only OR; RCT and cohort can yield both

— HR incorporates time-to-event from survival analysis (Cox regression)

— Case-control studies do not generate HRs because there is no follow-up time

— Case-control gives relative measures only (OR); it cannot give absolute risk reduction, NNT, NNH unless combined with external incidence data

— Boards may ask: "Why can the investigators not compute the NNT?" — because incidence is not estimable from a case-control design

— Population attributable fraction (PAF) can be approximated from case-control data using the OR and the prevalence of exposure among controls

— Useful for public-health framing: "What proportion of lung cancer in this population is attributable to smoking?"

— Unmatched: chi-square, Fisher exact, logistic regression

— Matched: McNemar, conditional logistic regression

— Stratified: Mantel-Haenszel pooled OR with test of homogeneity

Key distinction: When a Step 3 stem describes a case-control study and asks you to compute "relative risk" — the trick is that you cannot compute RR; the closest valid measure is the OR, which approximates RR only if the outcome is rare.

Odds ratio vs relative risk:

OR vs hazard ratio (HR):

OR vs prevalence ratio: cross-sectional studies often report prevalence ratios (PR); a prevalence OR overstates effect for common conditions

Absolute vs relative measures:

Attributable risk concepts:

Statistical tests:

Secondary Prevention — Translating Case-Control Findings Into Practice

— A single case-control study generates hypotheses and quantifies association, but rarely changes practice on its own

— Practice change typically requires consistency across multiple designs — Bradford Hill criteria for causation: strength, consistency, temporality, biological gradient, plausibility, coherence, experiment, analogy, specificity

— Systematic review/meta-analysis of RCTs > single RCT > cohort > case-control > case series > expert opinion

— But: for rare outcomes or unethical exposures, case-control may be the best available evidence and is appropriately incorporated into guidelines (e.g., Reye syndrome and aspirin)

— Smoking and lung cancer (Doll & Hill, 1950)

— DES and vaginal adenocarcinoma

— Aspirin and Reye syndrome → withdrawal of pediatric aspirin

— Sleeping position and SIDS → "Back to Sleep" campaign

— Tampon absorbency and toxic shock syndrome

— Establish causation alone

— Quantify treatment effect size for clinical decision-making

— Provide individual-level risk prediction (use cohorts and prediction models)

— Communicate relative, not absolute, magnitudes ("about 4 times higher odds")

— Disclose study limitations honestly (recall bias, observational)

Step 3 management: A patient asks about a recent news story claiming a food additive triples cancer risk based on a case-control study. The appropriate counseling acknowledges the association, explains that case-control studies cannot prove causation, notes potential recall and selection biases, and recommends watching for confirmatory cohort or trial data before changing behavior — a model shared-decision conversation.

From association to action:

Hierarchy of evidence (for boards):

Examples where case-control evidence drove major public-health change:

What case-control evidence does NOT do:

Counseling patients about case-control–derived risks:

Follow-Up — Reading and Critiquing a Case-Control Paper

— Title and abstract: study design named explicitly

— Methods: case definition, source population, control selection method, matching, exposure assessment, statistical methods, handling of confounders

— Results: numbers at each stage, characteristics of cases and controls, crude and adjusted ORs with CIs

— Discussion: limitations, generalizability, sources of bias addressed

— 1. How were cases defined and ascertained? (Incident vs prevalent? Validated criteria?)

— 2. How were controls selected? (Same source population? Risk of selection bias?)

— 3. How was exposure measured? (Blinded? Pre-existing records or recall? Validated instrument?)

— 4. Were potential confounders measured and adjusted for? (Including in the DAG — directed acyclic graph — sense)

— 5. Are results consistent with prior literature and biologically plausible?

— No description of control selection method

— Crude OR only, no adjusted analysis

— Wide CI crossing 1 reported as "trend toward significance"

— No discussion of recall or selection bias

— Subgroup OR cherry-picked from many tested

— Response rate among invited cases and controls (aim >70%)

— Concordance between self-reported and record-based exposure

— Inter-rater reliability for case adjudication (κ statistic)

Board pearl: When asked the most important limitation of a case-control study described in a stem, the answer is usually the most threatening bias — typically recall bias for exposure-by-interview studies, selection bias for hospital-control studies, or confounding by indication for drug-effect studies.

STROBE checklist — the structured critique tool (the case-control analog to CONSORT for RCTs):

Five questions to ask of any case-control paper:

Red flags during review:

Reporting effect estimates: always with point estimate + 95% CI, not p-value alone

Monitoring parameters for a quality study (research-team perspective):

Ethical, Legal, and Patient Safety Considerations

— Retrospective chart-review case-control studies typically qualify for expedited IRB review with waiver of informed consent under the Common Rule, provided: minimal risk, infeasible to obtain consent, and HIPAA safeguards are in place

— Prospective interview-based case-control studies require full informed consent, including disclosure of who funded the study and how data will be used

— Pediatric cases require parental permission + child assent (typically age ≥7)

— De-identification (Safe Harbor or Expert Determination) permits broader use without authorization

— Linking to external databases (death index, cancer registry) requires explicit data-use agreements

— Genetic case-control studies trigger GINA protections — must disclose limitations on insurance and employment protection

— Identified foodborne or communicable-disease outbreaks must be reported to the local health department (typically within 24 hours), and certain pathogens (measles, meningococcus, novel influenza) require immediate notification — this duty is independent of research IRB approval

— When a case-control study identifies a new harm signal (e.g., a medication associated with severe adverse event), investigators have an ethical duty to alert the FDA via MedWatch and to consider notifying treating clinicians of enrolled cases — even though this complicates blinding

— Industry-funded case-control studies of medication safety must disclose funding; questionnaire design and control selection are particular leverage points for bias

Step 3 patient-safety pearl: If you, as a clinician, discover during routine care that several of your patients have developed the same rare condition after a common exposure, the correct next steps are report to the local/state health department and FDA MedWatch, not to begin enrolling patients in your own ad hoc case-control study without IRB approval. Surveillance is a public-health duty; research is an IRB-supervised activity.

IRB oversight and consent:

HIPAA and data protection:

Mandatory reporting in outbreak case-control work:

Transition-of-care safety:

Conflicts of interest:

Vulnerable populations: prisoners, children, pregnant women, decisionally impaired adults receive heightened IRB scrutiny

Equity considerations: under-recruitment of minority controls leads to results that may not generalize and that may exacerbate disparities

High-Yield Associations and Rapid-Fire Facts

— Rare outcome → case-control

— Rare exposure → cohort

— Transient exposure + acute outcome → case-crossover

— Pre-collected biospecimens, rare outcome → nested case-control

— Outbreak with unknown denominator → case-control

— Outbreak with defined denominator (e.g., closed banquet roster) → retrospective cohort

— Case-control → OR

— Cohort/RCT → RR, RD, NNT, HR

— Cross-sectional → prevalence, prevalence ratio

— Case-control → recall bias, selection bias (Berkson)

— Cohort → loss to follow-up, healthy worker effect

— RCT → performance bias, attrition bias (mitigated by blinding and ITT)

— Cross-sectional → length-time bias, prevalence-incidence (Neyman) bias

— Doll & Hill 1950 — smoking and lung cancer

— Herbst 1971 — DES and vaginal adenocarcinoma in daughters

— Starko 1980 — aspirin and Reye syndrome

— Mitchell 1992 — sleeping prone and SIDS

— Schuchat / CDC — tampons and TSS

— Rare disease assumption: <~10% prevalence for OR ≈ RR

— Control ratio: 4:1 is the diminishing-returns ceiling

— GWAS significance: p < 5 × 10⁻⁸

— Statistical significance default: α = 0.05, CI 95%

— Nondifferential misclassification → toward null

— Recall bias (cases recall more) → away from null

— Berkson → toward null typically

— Overmatching → toward null

— Loss of severe cases (Neyman) → toward null

Board pearl: If forced to memorize one fact: a case-control study can ONLY produce an odds ratio, and the OR approximates the relative risk only when the outcome is rare. Most distractor answer choices on the exam violate one of these two rules.

Design selection by scenario:

Effect measure cheat sheet:

Bias by design (most-tested):

Famous case-control studies to recognize:

Numeric anchors:

Direction of bias quick map:

Board Question Stem Patterns

— Stem: "Investigators identified 200 patients with newly diagnosed pancreatic cancer and 400 cancer-free controls matched on age and sex, and asked about prior coffee consumption." → Case-control

— Stem provides a 2×2 table; compute ad/bc. Common trap: dividing by row totals (that would be RR — invalid here)

— OR 3.0 with CI 1.5–6.0 → significant, exposure associated with 3× higher odds of disease in cases. CI excludes 1 → significant

— OR 1.4 with CI 0.9–2.2 → not statistically significant; CI crosses 1

— Mothers of children with neural tube defects recall folate intake more thoroughly than controls → recall bias

— Hospital controls have COPD in a smoking–lung cancer study → Berkson / selection bias

— Severe rapidly fatal cases not enrolled → Neyman bias

— Cannot compute incidence, RR, NNT from a case-control because controls were sampled, not a true denominator

— Stratum-specific ORs similar but different from crude → confounding

— Stratum-specific ORs very different → effect modification; report stratum-specific results

— Effect of a transient trigger on an acute event → case-crossover

— Effect of OCP on rare ovarian cancer → case-control

— Effect of mid-pregnancy biomarker on rare congenital defect with stored samples → nested case-control

— Matched 1:1 design analyzed with chi-square → answer: redo with McNemar test or conditional logistic regression

Step 3 management: When the answer choices include "odds ratio," "relative risk," "hazard ratio," and "incidence rate ratio" for a case-control study, odds ratio is essentially always correct — the other three require denominators or time that this design does not have.

Pattern 1 — "Name the design":

Pattern 2 — "Calculate the OR":

Pattern 3 — "Interpret the OR":

Pattern 4 — "Identify the bias":

Pattern 5 — "Why can't we compute X?":

Pattern 6 — "Confounder vs effect modifier":

Pattern 7 — "Choose the best design for the question":

Pattern 8 — "Best next step in analysis":

One-Line Recap

A case-control study starts with the outcome, looks backward at exposure, and yields an odds ratio — efficient for rare diseases and long latencies, but vulnerable to recall and selection bias and unable to deliver incidence, relative risk, or absolute effect measures.

Board pearl: On Step 3, if the question gives you cases first, controls second, and asks about a measure of association — the answer is odds ratio, the threat is recall or selection bias, and the fix is usually a nested case-control with pre-collected exposure data and blinded ascertainment. Lock those three facts and most case-control questions resolve in under thirty seconds.

Design fingerprint: select on outcome → ascertain prior exposure → compute OR = ad/bc; OR ≈ RR only when disease is rare (<10%)

Bias trio to recognize on every stem: recall bias (cases remember more), selection bias (Berkson with hospital controls, Neyman from missing fatal cases), and confounding by indication (especially in drug-effect studies); mitigated by nested designs, blinded structured ascertainment, pre-collected records, and multivariable adjustment

Analysis match-up: unmatched → chi-square / logistic regression; matched → McNemar / conditional logistic regression; stratified → Mantel-Haenszel with test of homogeneity to distinguish confounding (adjust away) from effect modification (report separately)

When to choose case-control vs alternatives: rare outcome or long latency → case-control; rare exposure → cohort; transient trigger of acute event → case-crossover; outbreak with stored samples or pre-existing cohort → nested case-control; closed-population outbreak with known denominator → retrospective cohort