top of page

Eduovisual

Biostatistics & Population Health

Logistic regression and odds ratio interpretation

Clinical Overview and When to Suspect Logistic Regression

— Outcome is dichotomous and you want the probability of the event as a function of predictors.

— You need to adjust for confounders (age, sex, comorbidity) in an observational cohort or case-control study.

— A linear regression would be inappropriate because predicted probabilities must lie between 0 and 1.

— log(odds of outcome) = β₀ + β₁X₁ + β₂X₂ + …

— Each β coefficient, when exponentiated (e^β), yields an adjusted odds ratio (aOR) for that predictor.

— Vignettes describe a published study ("after adjustment for age and smoking, the OR for MI was 2.4 [95% CI 1.6–3.5]") and ask what the number means, whether it is significant, or whether causation can be inferred.

— Common in case-control studies, where logistic regression is the natural analytic tool because incidence cannot be directly calculated.

— "Adjusted odds ratio," "multivariable model," "controlling for…," or a binary endpoint with multiple covariates.

Board pearl: If the outcome is binary and the question gives an odds ratio with 95% CI, the underlying analysis is almost always logistic regression. If the outcome is time-to-event (with censoring), think Cox proportional hazards and hazard ratios instead — a frequent distractor pair on Step 3.

Logistic regression is the workhorse statistical model for Step 3 biostatistics questions in which the outcome is binary (yes/no, dead/alive, readmitted/not, MI/no MI) and one or more predictors (continuous or categorical) are entered.
Use it when:
Model form (conceptual, not memorized algebra):
Why Step 3 cares:
Suspect logistic regression as the correct method when the stem mentions:
Solid White Background
Presentation Patterns and Key History

— A case-control study of patients with pancreatic cancer vs. matched controls reports an aOR for heavy alcohol use of 1.8 (95% CI 1.2–2.7) after adjusting for smoking, BMI, and diabetes.

— A retrospective cohort of postoperative patients models 30-day readmission against age, ASA class, and discharge disposition.

— A cross-sectional survey examines factors associated with vaccine uptake.

— Outcome is described in binary terms ("developed delirium," "experienced readmission," "tested positive").

— Effect estimates reported as OR or adjusted OR, not RR, HR, or mean difference.

— Mention of multiple covariates being "controlled for" or "entered into the model."

Case-control design → must use OR (cannot compute incidence) → logistic regression is the standard.

Cohort design with binary outcome and no time-to-event focus → logistic regression is appropriate, though RR is often preferable when the outcome is common.

Rare outcome (<10%) → OR ≈ RR, so logistic regression OR can be interpreted similarly to RR.

Common outcome (>10%) → OR overstates RR; this is a classic Step 3 trap.

Key distinction: Odds ratio ≠ relative risk. The OR approximates RR only when the outcome is uncommon. When a stem reports a 40% event rate and an OR of 3.0, the true RR is substantially smaller — recognizing this is a high-yield Step 3 testing point about interpretation literacy, not calculation.

Typical Step 3 stem framing for logistic regression questions:
Key history clues that the analysis is logistic regression:
Important contextual cues:
Watch for stems that mix designs (e.g., "nested case-control within a cohort") — the analytic method is still logistic regression for the binary outcome.
Solid White Background
Physical Exam Findings (and Model Diagnostics Analogue)

Sample size adequacy: rule of thumb is ~10 outcome events per predictor variable. A model with 5 covariates needs ≥50 events.

Confidence interval width: very wide CIs (e.g., OR 2.5, 95% CI 0.4–18) signal sparse data or overfitting.

Goodness-of-fit: Hosmer–Lemeshow test (non-significant p-value = adequate fit) and C-statistic (AUC) for discrimination (>0.7 acceptable, >0.8 strong).

Calibration: predicted vs. observed event rates across risk deciles.

Multicollinearity: highly correlated predictors (e.g., BMI and weight) inflate standard errors and produce unstable ORs.

— Extremely large OR (e.g., 25) with CI crossing or near 1 in a small subgroup.

— Missing data handling not described.

— No mention of how confounders were selected (cherry-picked covariates → residual confounding).

Confounder: adjusting changes the OR meaningfully (>10%).

Effect modifier (interaction): the OR differs across strata of a third variable; reported as separate ORs or an interaction term.

Board pearl: A logistic regression OR with a 95% CI that includes 1.0 is not statistically significant at α=0.05, regardless of how large the point estimate looks. This is the single most testable interpretive fact about ORs on Step 3 — examinees consistently miss it when distracted by a dramatic point estimate.

In biostatistics chunks, "physical exam" maps to model diagnostic checks — the things you inspect to decide whether a logistic regression result is trustworthy.
Things to assess in a reported logistic regression:
Red flags suggesting an OR should not be trusted:
Confounding vs. effect modification:
Solid White Background
Diagnostic Workup — Interpreting the Odds Ratio

OR = 1: no association between exposure and outcome.

OR > 1: exposure associated with increased odds of outcome.

OR < 1: exposure associated with decreased odds (protective).

— Same interpretation, but controlling for the other variables in the model.

— "aOR 1.7 for smoking" means smokers have 1.7× the odds of the outcome compared to non-smokers, holding other covariates constant.

— OR is per one-unit increase in the predictor.

— Example: aOR 1.04 per year of age means each additional year raises odds by 4%. Over 10 years, odds multiply by 1.04¹⁰ ≈ 1.48.

— Step 3 stems sometimes report per-SD or per-10-unit increases — read carefully.

— OR is for that category vs. the reference group (often the lowest or "none" category).

— Determined by 95% CI not crossing 1, or equivalently p < 0.05.

— A "borderline" CI (e.g., 1.01–2.50) is technically significant but clinically fragile.

— A statistically significant OR of 1.05 in a huge dataset may be clinically trivial.

— Conversely, an OR of 3.0 with CI 0.9–10 in a small study is clinically suggestive but underpowered.

Step 3 management: When asked to interpret an aOR, always state (1) direction (risk vs. protective), (2) magnitude, (3) whether the CI excludes 1, and (4) what was adjusted for — examiners reward this structured reading over numerical manipulation.

Odds = probability of event / probability of no event = p/(1−p).
Odds ratio (OR) = odds in exposed / odds in unexposed.
Interpretation rules:
Adjusted OR (aOR):
Continuous predictors:
Categorical predictors:
Statistical significance:
Clinical vs. statistical significance:
Solid White Background
Diagnostic Workup — Advanced Concepts and Confirmatory Studies

— For race, BMI category, or insurance status, the OR depends entirely on what the reference group is. A stem reporting "OR 2.1 for obese" implicitly references normal-weight individuals.

— Tested by adding a product term (X₁ × X₂) to the model.

— If the interaction term is significant, the effect of one variable depends on the level of another — report stratified ORs, not a single overall OR.

— Example: aspirin's effect on stroke differs by sex → stratify by sex.

— When two predictors are highly correlated (e.g., systolic and diastolic BP), individual ORs become unstable. Drop one or combine them.

— Too many predictors for too few events produces ORs that won't replicate. Validate in an independent dataset.

— Used for matched case-control studies (e.g., 1:1 matched on age and sex). Accounts for the matched design.

— Outcome has >2 unordered categories (e.g., disease A vs. B vs. none).

— Outcome is ordered (mild/moderate/severe).

— Used when predictors outnumber events, to shrink coefficients and improve generalizability.

Key distinction: Logistic regression gives odds ratios; Poisson regression gives rate ratios (events per person-time); Cox regression gives hazard ratios (instantaneous risk over time). Step 3 distractor sets routinely swap these — match the analytic method to the outcome structure, not to which sounds most impressive.

Reference category matters:
Interaction (effect modification):
Collinearity:
Overfitting:
Conditional logistic regression:
Multinomial logistic regression:
Ordinal logistic regression:
Penalized regression (ridge, lasso):
Solid White Background
Risk Stratification — Choosing the Right Regression Model

Binary outcome, no time component → logistic regression → OR.

Binary outcome, time-to-event with censoring → Cox proportional hazards → HR.

Count outcome (number of falls, admissions) → Poisson or negative binomial → IRR.

Continuous outcome → linear regression → β coefficient (mean difference per unit).

Ordinal outcome → ordinal logistic → proportional odds.

— Case-control studies (mandatory — incidence unknowable).

— Cross-sectional studies with binary outcomes.

— Cohort studies when the outcome is rare and OR ≈ RR.

— Cohort studies with common outcomes, where readers may misinterpret OR as RR. Modified Poisson regression or log-binomial regression yields RR directly.

A priori based on causal/DAG reasoning (preferred).

Change-in-estimate (>10% change in OR when variable is added).

— Avoid stepwise selection — it inflates type I error and is poorly regarded.

— Used in observational studies to balance covariates between exposed and unexposed groups.

— Often paired with logistic regression to estimate the propensity itself.

Board pearl: If a Step 3 stem describes a case-control study, the only correct effect measure is the odds ratio — you literally cannot compute RR or incidence because participants were sampled on outcome status. Selecting "relative risk" as the answer is a guaranteed wrong choice in that setting.

Decision logic for the Step 3 examinee:
When OR is the appropriate effect measure:
When OR is misleading:
Confounder selection strategies:
Propensity score methods:
Solid White Background
Pharmacotherapy Analogue — Worked OR Interpretations

— Interpretation: Smokers have 2.3× the odds of MI compared to non-smokers, holding other listed factors constant; statistically significant (CI excludes 1).

— This does not mean smokers are 2.3× more likely (that would be RR), nor does it prove causation.

— Statins associated with 40% lower odds of dementia after adjustment.

— Protective association; significant. But observational → residual confounding (healthy-user bias) possible.

— Each 1 mg/dL rise increases odds by 4%; per 40 mg/dL (one SD-ish), odds multiply by 1.04⁴⁰ ≈ 4.8.

— Tiny per-unit OR can be clinically large over a realistic range.

Not significant (CI crosses 1); cannot conclude an association exists.

— Effect modification by age; report stratified, not pooled, ORs.

Step 3 management: When the vignette gives you an aOR, translate it into plain English ("X has 2.3 times the odds of Y, adjusted for…") before evaluating answer choices. Most distractors are reworded errors (confusing OR with RR, missing CI inclusion of 1, ignoring adjustment).

Example 1: A case-control study of MI patients reports aOR 2.3 (95% CI 1.5–3.5) for current smoking after adjusting for age, sex, BMI, diabetes, and hypertension.
Example 2: aOR 0.6 (95% CI 0.4–0.9) for statin use and incident dementia.
Example 3: aOR 1.04 (95% CI 1.02–1.06) per mg/dL increase in LDL for CAD events.
Example 4: aOR 1.2 (95% CI 0.8–1.8) for moderate alcohol and breast cancer.
Example 5 (interaction): aOR for hormone therapy = 1.5 in women <60 but 2.8 in women ≥60; interaction p=0.01.
Solid White Background
Advanced Pharmacology Analogue — Common OR Pitfalls

— When outcome prevalence is high (say 30%), an OR of 3.0 corresponds to an RR closer to 1.8. Reporters and readers often quote the OR as if it were RR, exaggerating effect size.

— In cross-sectional logistic regression, you cannot tell whether the exposure preceded the outcome.

— Hospital-based case-control studies select controls who may differ systematically (Berkson's bias), distorting ORs.

— Cases remember exposures (e.g., medications during pregnancy) more thoroughly than controls, inflating ORs.

— Including a variable on the causal pathway (mediator) between exposure and outcome attenuates the true effect. Example: adjusting for LDL when studying saturated fat and CAD.

— Very few events in a covariate cell produce wildly inflated ORs with huge CIs. Look for OR > 10 with CI lower bound near 1.

— Testing 20 predictors at α=0.05 yields ~1 false-positive by chance. Bonferroni or FDR correction tightens the threshold.

— β₀ gives baseline log-odds when all predictors = 0; rarely clinically meaningful unless predictors are centered.

Board pearl: A statistically significant OR in an observational study does not establish causation. Step 3 will offer "X causes Y" as a tempting answer; the correct response is typically "X is associated with Y, after adjustment for measured confounders" — preserving epistemic humility about unmeasured confounding.

Pitfall 1 — OR-as-RR error:
Pitfall 2 — Reverse causation:
Pitfall 3 — Selection bias:
Pitfall 4 — Recall bias:
Pitfall 5 — Overadjustment:
Pitfall 6 — Sparse-data bias:
Pitfall 7 — Multiple comparisons:
Pitfall 8 — Misinterpreting the intercept:
Solid White Background
Special Populations — Small Samples and Rare Events

Total sample size is small (<100).

Number of events is small relative to predictors (<10 events per variable).

Cells are empty (e.g., no exposed cases) — produces infinite or undefined ORs.

Exact logistic regression: computes exact p-values for sparse data.

Firth's penalized likelihood: reduces bias from small samples and separation.

Bayesian logistic regression: incorporates prior information when data are scarce.

— Collapsing categories or dropping rare predictors.

— Subgroups inevitably have fewer events → wider CIs → less power. A "non-significant" subgroup finding may simply reflect insufficient power, not absence of effect.

— Pre-specified subgroup analyses are more credible than post-hoc.

— Often have small Ns (e.g., dialysis patients, transplant recipients), so reported aORs are imprecise. Look for very wide CIs.

— STROBE for observational studies, TRIPOD for prediction models — both require disclosure of sample size, events, and model performance.

Key distinction: A non-significant OR in an underpowered study does not mean "no effect" — it means the data cannot distinguish the true effect from no effect. The correct interpretation is "inconclusive," not "no association." Step 3 frequently tests this nuance with subgroup-analysis vignettes.

Logistic regression behaves poorly when:
Solutions used in the literature:
Subgroup analyses:
Renal/hepatic impairment analogue — studies in specialized populations:
Reporting standards:
Solid White Background
Special Populations — Pediatric, Pregnancy, and Subgroup Modeling

— Outcomes (e.g., preterm birth, congenital anomaly, NICU admission) are dichotomous.

— RCTs are limited for ethical reasons, so observational designs dominate.

Matched case-control designs are common (e.g., matching on maternal age, parity) → use conditional logistic regression.

Cluster effects: siblings or twins share exposures; ignoring this yields falsely narrow CIs. Use generalized estimating equations (GEE) or mixed-effects logistic regression.

Time-varying exposures during pregnancy: simple logistic regression cannot handle this; consider trimester-specific models.

— ORs for race/ethnicity must be interpreted as associations, not biological causation — they reflect structural factors confounded with race.

— Step 3 increasingly tests recognition that race is a social construct in epidemiologic models.

— Competing risks (death from another cause) may bias logistic regression of non-fatal outcomes. Cox models with competing-risk adjustment are preferred when applicable.

Board pearl: When a study reports an aOR for a maternal exposure (e.g., SSRI use) and a fetal outcome (e.g., cardiac defect), look for adjustment for the underlying indication (depression itself) — confounding by indication is the classic source of inflated drug-harm signals on Step 3 obstetric biostatistics stems.

Pediatric and pregnancy studies often rely on logistic regression because:
Methodological cautions in these populations:
Health disparities research:
Geriatric studies:
Solid White Background
Complications — Misuse and Misreporting of ORs

— A press release says "drinking coffee doubles risk of cancer" based on OR 2.0 in a case-control study. Two errors: (1) OR ≠ risk; (2) association ≠ causation.

— Researchers may select adjustments that move the OR toward significance. Pre-registration mitigates this.

— Scanning hundreds of predictors yields false-positive ORs by chance. Requires correction (Bonferroni, FDR).

— Reporting only point estimates without CIs hides uncertainty.

— An aOR for age (per year) derived in a 40–70 cohort should not be applied to 20-year-olds.

— Logistic regression assumes linearity of continuous predictors on the log-odds scale. Violations bias coefficients; remedies include splines or categorization.

— Misclassifying exposure status during a period when outcome was impossible inflates protective ORs (common in drug-effectiveness studies).

— Significant ORs are more likely to be published; meta-analyses may overstate true effects.

Step 3 management: When evaluating a published OR, ask: (1) Was the design appropriate? (2) Were confounders measured and adjusted? (3) Is the CI narrow enough to be informative? (4) Does the conclusion match the design (association vs. causation)? Apply this checklist on every biostats vignette.

Complication 1 — Causal language in media:
Complication 2 — Cherry-picked covariates:
Complication 3 — Multiple-testing in genomics/biomarkers:
Complication 4 — Misreporting CI:
Complication 5 — Extrapolation beyond data range:
Complication 6 — Ignoring model assumptions:
Complication 7 — Immortal time bias:
Complication 8 — Publication bias:
Solid White Background
When to Escalate — Choosing Beyond Standard Logistic Regression

Repeated measures per subject (longitudinal data) → mixed-effects logistic regression or GEE.

Clustered data (patients within hospitals) → multilevel/hierarchical models.

Time-to-event outcome with meaningful censoring → switch to Cox regression.

Strong confounding by indication → propensity score matching or instrumental variables.

High-dimensional data (genomics, claims data) → penalized regression (lasso, elastic net) or machine-learning alternatives.

Causal inference goals → targeted maximum likelihood estimation, g-methods, marginal structural models.

— Before data collection (study design, sample size).

— When the planned analysis involves matched, clustered, or longitudinal data.

— When interactions or non-linearities are suspected.

— Before publishing — peer reviewers will catch errors that authors miss.

STROBE — observational studies.

TRIPOD — prediction models.

CONSORT — RCTs (logistic regression often used for binary outcomes within RCTs).

PRISMA — systematic reviews and meta-analyses.

CCS pearl: On a CCS-style biostatistics-in-clinical-practice item, if your team is interpreting a quality-improvement dataset on readmissions, the appropriate "order" is a risk-adjusted logistic regression controlling for case mix — raw rates can unfairly penalize hospitals caring for sicker patients. Recognize risk adjustment as a core safety/quality concept.

Escalate to more advanced methods when:
When to consult a statistician (Step 3 systems-thinking analogue):
Reporting guidelines to recognize:
Solid White Background
Key Differentials — Same-Category Statistical Tools

— Chi-square: tests association between two categorical variables, no adjustment.

— Logistic regression: provides effect size (OR) and allows multivariable adjustment.

— Fisher's: 2×2 tables with small expected counts; no covariate adjustment.

— Logistic: scalable to multiple predictors.

— Both handle binary outcomes; log-binomial yields RR directly, preferred when outcome is common in a cohort. Convergence issues sometimes force fallback to modified Poisson with robust SEs.

— Both model binary outcomes; probit uses normal CDF instead of logistic. Coefficients aren't directly interpretable as ORs. Rarely tested.

— Conditional: matched designs.

— Unconditional: unmatched.

— Multinomial: unordered ≥3 categories.

— Ordinal: ordered ≥3 categories; assumes proportional odds.

— Treats binary outcome with OLS; coefficients = risk differences. Simpler interpretation but can predict probabilities outside [0,1].

Key distinction: A 2×2 contingency table with no covariates → chi-square or Fisher's. The same data with adjustment for confounders → logistic regression. Step 3 stems differentiate by whether "after adjusting for…" appears in the methods description.

Logistic regression vs. chi-square test:
Logistic regression vs. Fisher's exact test:
Logistic regression vs. log-binomial regression:
Logistic regression vs. probit regression:
Conditional vs. unconditional logistic regression:
Multinomial vs. ordinal logistic:
Linear probability model:
Solid White Background
Key Differentials — Other-Category Effect Measures

— Binary outcome, case-control friendly, ≠ RR when outcome is common.

— Direct probability ratio; intuitive; requires cohort data.

— Clinically most useful for treatment decisions; NNT = 1/ARR.

— Time-to-event with censoring; assumes proportional hazards.

— Counts over person-time.

— Continuous outcomes.

— Effect size for continuous outcomes across different scales.

— Diagnostic test performance; distinct from regression-derived measures.

— Derived from ARR; communicates clinical impact.

When a question asks "which measure best communicates clinical impact to a patient," the answer is usually ARR or NNT, not OR or RR. When the design is case-control, the answer is OR. When the outcome is time-to-event, the answer is HR.

Board pearl: An OR of 5.0 sounds impressive but may correspond to an ARR of only 2% (NNT 50) — clinical communication should emphasize absolute measures. Step 3 ethics/communication items often pair this with informed consent scenarios where patients deserve absolute risk numbers, not relative ones.

Odds ratio (logistic regression):
Relative risk (cohort, log-binomial, modified Poisson):
Risk difference (absolute risk reduction):
Hazard ratio (Cox regression):
Incidence rate ratio (Poisson regression):
Mean difference (linear regression):
Standardized mean difference (Cohen's d):
Likelihood ratios (LR+ / LR−):
Number needed to treat / harm:
Solid White Background
Secondary Prevention — Building Reliable OR-Based Evidence

Pre-specify the model: outcome, predictors, interactions, sensitivity analyses.

Register the protocol (ClinicalTrials.gov, OSF) before analysis.

— Report all covariates considered, not only those retained.

— Provide full CI and exact p-values, not just "p<0.05."

— Include a sensitivity analysis (e.g., complete-case vs. multiple imputation for missing data).

— Validate prediction models in an independent cohort.

— Build a habit of reading methods before results.

— Check whether the design supports the conclusion (cross-sectional cannot establish temporality).

— Look for competing risks in elderly cohorts.

— Note whether effect modifiers were tested.

— Confirm that the reference category is clinically meaningful.

— An aOR from observational data should rarely change practice alone; integrate with RCT evidence and biological plausibility (Bradford Hill criteria).

— Meta-analyses pool ORs across studies; check for heterogeneity (I² statistic).

Step 3 management: When a clinic adopts a risk-prediction tool built on logistic regression (e.g., ASCVD risk, Wells score, MELD), confirm it was externally validated in a population similar to yours. Unvalidated locally derived models often miscalibrate, leading to over- or under-treatment.

Habits that make logistic regression results trustworthy and reproducible:
Long-term practices for clinicians consuming biostatistics literature:
Translating evidence into practice:
Solid White Background
Follow-Up — Monitoring Model Performance Over Time

Discrimination (C-statistic / AUC): ability to rank patients; 0.5 = chance, 1.0 = perfect. 0.7–0.8 acceptable.

Calibration: agreement between predicted and observed event rates across deciles; visualized as calibration plot.

Brier score: mean squared error of predicted probabilities; lower is better.

Net reclassification improvement (NRI) and integrated discrimination improvement (IDI): gauge whether a new predictor adds incremental value.

— Patient population shifts (demographics, comorbidities).

— Treatment changes (improved care lowers event rates).

— Coding changes (ICD-9 → ICD-10).

— Care-setting changes.

— Use risk scores as decision aids, not substitutes for clinical judgment.

— Communicate absolute predicted risk to patients, with uncertainty.

— Document shared decision-making, especially when crossing treatment thresholds (e.g., ASCVD 7.5% for statin initiation).

— Audit model outputs vs. observed outcomes annually.

— Retrain or recalibrate when calibration degrades.

Board pearl: A model with an excellent C-statistic but poor calibration overstates risk in some patients and understates it in others — and is therefore dangerous for individual decision-making. Discrimination and calibration are separate properties; both must be acceptable. Step 3 distractor sets often praise a "high AUC" while ignoring calibration failure.

Prediction models drift; periodic recalibration is essential.
Key performance metrics:
Why models drift:
Counseling clinicians on tool use:
Quality-improvement loop:
Solid White Background
Ethical, Legal, and Patient Safety Considerations

— Logistic regression models trained on biased data perpetuate disparities. A readmission model that includes ZIP code or insurance can systematically under-allocate resources to disadvantaged populations.

— Step 3 increasingly tests recognition of fairness audits and the need to disaggregate model performance by race, sex, and SES.

— Many traditional models (eGFR, ASCVD) historically included race coefficients. Current guidance (e.g., 2021 NKF-ASN eGFR) removes race from eGFR; clinicians must recognize this transition and update order sets.

— When using a risk calculator (e.g., breast cancer Gail model, ASCVD) to guide decisions about chemoprevention or statins, present absolute risk and CI, not just relative measures.

— Document the discussion in the medical record.

— Logistic regression models built on EHR data require IRB approval and HIPAA-compliant data handling.

— Patients should be informed when their data fuel predictive analytics.

— Reporting of clinical research follows ICMJE/FDA mandates; misrepresenting an OR or selectively reporting predictors constitutes research misconduct.

— Discharge prediction tools (e.g., LACE for readmission) must be communicated to outpatient providers; failure to transmit risk scores is a documented care-transition safety gap.

Step 3 management: Before deploying any risk-prediction algorithm in your practice, ask (1) was it validated in patients like mine, (2) does it perform equitably across demographic subgroups, and (3) is the patient informed that an algorithm is contributing to their care plan? These three questions cover the bias, validation, and consent triad central to modern Step 3 ethics items.

Algorithmic bias:
Race as a predictor:
Informed consent for risk communication:
Privacy and data governance:
Mandatory reporting analogue:
Transition-of-care safety:
Solid White Background
High-Yield Associations and Rapid-Fire Facts

Board pearl: Memorize the rule "case-control → OR; cohort/RCT → RR or HR; cross-sectional → prevalence OR." Matching study design to effect measure is the single most testable biostatistics pattern on Step 3, appearing in ~80% of regression-related stems.

OR = 1 → no association; OR > 1 → ↑odds; OR < 1 → protective.
95% CI excluding 1 → statistically significant at α=0.05.
Case-control study → must use OR, cannot compute RR.
Rare outcome (<10%) → OR ≈ RR.
Common outcome → OR overestimates RR.
Each β coefficient: e^β = OR per unit increase in predictor.
Continuous predictor OR is per one unit — read units carefully.
Reference category drives interpretation of categorical OR.
Adjusted OR controls for measured confounders; unmeasured confounding persists.
10 events per predictor variable is a minimum sample-size rule.
C-statistic ≥0.7 = acceptable discrimination; ≥0.8 = good; ≥0.9 = excellent (but suspect overfit).
Hosmer–Lemeshow non-significant p → adequate calibration.
Matched case-control → conditional logistic regression.
Clustered data → mixed-effects or GEE.
Time-to-event → switch to Cox regression (HR, not OR).
Counts → Poisson regression (IRR).
Continuous outcomes → linear regression (β, mean difference).
Multiple comparisons → Bonferroni or FDR correction.
Confounder vs. mediator: never adjust for mediators when estimating total effect.
Confounding by indication is the #1 source of bias in observational drug studies.
Association ≠ causation; Bradford Hill criteria (strength, consistency, temporality, dose-response, plausibility) help judgment.
ASCVD risk, Wells score, CHA₂DS₂-VASc, MELD, Gail model — all derived from logistic regression.
Solid White Background
Board Question Stem Patterns

— "After adjustment for age, BMI, and smoking, aOR for outcome was 2.4 (95% CI 1.6–3.6). Which is the best interpretation?"

— Correct: associated with higher odds, statistically significant, after adjustment; does not prove causation; OR ≠ RR.

— "aOR 1.8 (95% CI 0.9–3.6). Conclusion?"

— Correct: not statistically significant; cannot conclude an association.

— "Investigators want to study factors associated with 30-day readmission after MI in 5000 patients. Which analysis is most appropriate?"

— Correct: multivariable logistic regression (binary outcome). Distractors: linear regression, chi-square, Cox (would be correct only if time-to-readmission with censoring matters).

— High-prevalence outcome with reported OR; asked about true RR.

— Correct: OR overstates RR; true RR is smaller.

— Asked for the appropriate measure of association.

— Correct: odds ratio (RR uncomputable).

— Sicker patients more likely to receive drug; drug appears harmful in unadjusted analysis. Asked for the source of bias.

— Correct: confounding by indication; remedied by adjustment or propensity scores.

— Non-significant in a small subgroup despite overall significance. Asked for interpretation.

— Correct: inadequately powered, not "no effect."

— High C-statistic, poor calibration.

— Correct: discrimination good, calibration poor; unsuitable for individual risk prediction without recalibration.

Step 3 management: Before selecting an answer, restate the stem in your own words ("binary outcome, case-control, adjusted for X, CI excludes 1") — this disciplined translation eliminates 3 of 5 distractors on most biostatistics items.

Pattern 1 — Interpret an adjusted OR:
Pattern 2 — CI crosses 1:
Pattern 3 — Choose the right model:
Pattern 4 — OR vs. RR:
Pattern 5 — Case-control effect measure:
Pattern 6 — Confounding by indication:
Pattern 7 — Subgroup analysis:
Pattern 8 — Model performance:
Solid White Background
One-Line Recap

Logistic regression models a binary outcome to yield odds ratios that are interpreted as the multiplicative change in odds per unit (or category) of a predictor, adjusted for the covariates in the model, statistically significant only when the 95% CI excludes 1, equivalent to relative risk only when the outcome is rare, and capable of demonstrating association but never causation.

Board pearl: When the Step 3 stem describes a binary outcome, adjusted analyses, and reports an OR with 95% CI — translate it as "X-fold odds, adjusted for these covariates, significant only if CI excludes 1, associated not causal" — that single sentence resolves the majority of logistic regression items you will encounter on test day.

Effect measure: OR = e^β; OR>1 risk, OR<1 protective, OR=1 null.
Significance: CI excluding 1 (equivalently p<0.05); statistical ≠ clinical importance.
Design fit: case-control mandates OR; cohort with rare outcome → OR ≈ RR; cohort with common outcome → OR exaggerates RR; time-to-event → switch to Cox HR; counts → Poisson IRR; continuous → linear β.
Pitfalls to recite under pressure: confounding by indication, OR-as-RR error, sparse-data inflation, overadjustment on mediators, immortal time bias, subgroup underpowering, multiple comparisons, race as a non-biological proxy, and poor calibration despite high discrimination.
Clinical translation: communicate absolute risk and NNT to patients, validate any prediction tool in your population, audit for demographic fairness, recalibrate periodically, and document shared decision-making whenever a model crosses a treatment threshold.
Solid White Background
bottom of page