top of page

Biostatistics & Epidemiology

t-tests (independent, paired), ANOVA

Core Principle of t-tests and ANOVA
🧷 t-tests and ANOVA are parametric statistical tests that compare means between groups to determine if observed differences are statistically significant or due to random chance.
🧷 Both tests assume data follows a normal distribution, observations are independent, and variances are roughly equal between groups (homoscedasticity).
🧷 The fundamental question: Is the variability between group means larger than expected from the variability within groups?
🧷 t-tests compare exactly 2 groups; ANOVA compares 3 or more groups simultaneously.
🧷 Board pearl: If comparing means of 2 groups → t-test. If comparing means of ≥3 groups → ANOVA.
Solid White Background
The t-statistic: Signal-to-Noise Ratio
📍 t = (difference between group means) ÷ (standard error of the difference).
📍 Conceptually, t represents how many standard errors separate the two group means — larger t values indicate greater separation.
📍 The denominator incorporates both sample variability (pooled standard deviation) and sample size.
📍 Degrees of freedom (df) = n₁ + n₂ − 2 for independent samples, where n₁ and n₂ are the two sample sizes.
📍 Critical t-values depend on df and desired significance level (α, typically 0.05).
📍 Board pearl: Larger sample sizes → smaller standard error → larger t-statistic → more likely to detect true differences.
Solid White Background
Independent (Unpaired) t-test
🔹 Compares means between two independent groups with no pairing or matching between observations.
🔹 Classic example: Comparing mean blood pressure between patients receiving drug A versus placebo, where different patients are in each group.
🔹 Assumes both groups are sampled from populations with equal variances (can be tested with Levene's test).
🔹 If variances are unequal, use Welch's t-test modification which adjusts the degrees of freedom.
🔹 Board distinction: Independent = different subjects in each group. No subject appears in both groups.
Solid White Background
Paired (Dependent) t-test
Compares means between two related measurements on the same subjects or matched pairs.
Classic examples: Before-and-after treatment measurements on the same patients, left eye vs right eye measurements, twin studies.
Analyzes the mean of the differences between paired observations, not the difference between means.
More powerful than independent t-test because it controls for between-subject variability.
df = n − 1, where n is the number of pairs.
Board pearl: If the question mentions "same patients measured twice" or "matched pairs" → paired t-test.
Solid White Background
One-Sample t-test
Compares a sample mean to a known population mean or hypothesized value.
Example: Testing if mean birth weight in a hospital (sample) differs from the national average of 3300g (population value).
t = (sample mean − hypothesized mean) ÷ (standard error of the mean).
Standard error = sample standard deviation ÷ √n.
df = n − 1.
Less commonly tested on boards but appears when comparing observed data to established norms or reference values.
Solid White Background
ANOVA: Analysis of Variance Fundamentals
🧠 ANOVA tests the null hypothesis that all group means are equal: H₀: μ₁ = μ₂ = μ₃ = ... = μₖ.
🧠 Despite its name focusing on "variance," ANOVA is fundamentally about comparing means.
🧠 The test partitions total variability into between-group variability (signal) and within-group variability (noise).
🧠 F-statistic = (between-group variance) ÷ (within-group variance).
🧠 Large F values suggest group differences; F ≈ 1 suggests no difference.
🧠 Board pearl: ANOVA tells you if at least one group differs but doesn't identify which specific groups differ.
Solid White Background
One-Way ANOVA
Compares means across ≥3 groups for a single independent variable (factor).
Example: Comparing mean cholesterol levels among patients on diet alone vs statin vs diet + statin (3 groups, 1 factor).
Between-group df = k − 1 (k = number of groups).
Within-group df = N − k (N = total sample size).
Results in an F-statistic with its associated p-value.
If significant, requires post-hoc testing to determine which specific group pairs differ.
Solid White Background
Post-Hoc Testing After ANOVA
📌 A significant ANOVA indicates at least one group differs, but doesn't specify which groups.
📌 Post-hoc tests perform pairwise comparisons while controlling for multiple comparison inflation of Type I error.
📌 Common methods: Tukey's HSD (honestly significant difference), Bonferroni correction, Scheffé test.
📌 Bonferroni: divides α by the number of comparisons; most conservative.
📌 Tukey's HSD: balances Type I and II error; most commonly used.
📌 Board pearl: Without post-hoc correction, performing multiple t-tests inflates the chance of finding false positives.
Solid White Background
Two-Way ANOVA
📣 Analyzes the effect of two independent variables (factors) on a continuous outcome.
📣 Example: Examining how both drug type (A vs B) and dosing frequency (daily vs BID) affect blood pressure.
📣 Tests three hypotheses: main effect of factor 1, main effect of factor 2, and interaction between factors.
📣 Interaction means the effect of one factor depends on the level of the other factor.
📣 Board clue: If a question mentions examining two different categorical variables' effects on an outcome → two-way ANOVA.
Solid White Background
Repeated Measures ANOVA
🔸 Extension of paired t-test to ≥3 time points or conditions on the same subjects.
🔸 Example: Measuring pain scores at baseline, 1 hour, 2 hours, and 4 hours after analgesia administration.
🔸 Accounts for correlation between repeated measurements on the same subject.
🔸 More powerful than independent groups ANOVA because it controls for between-subject variability.
🔸 Assumes sphericity (equal variances of differences between all pairs of conditions).
🔸 Board distinction: Same subjects measured multiple times → repeated measures ANOVA, not one-way ANOVA.
Solid White Background
Assumptions and Violations
🧷 Normality: data within each group should be approximately normally distributed. Violated with skewed data or outliers.
🧷 Independence: observations must be independent. Violated with clustered data or repeated measures (unless using appropriate test).
🧷 Homogeneity of variance: equal variances across groups. Violated when one group has much more variability.
🧷 For t-tests, Central Limit Theorem helps with normality assumption if n ≥ 30 per group.
🧷 ANOVA is fairly robust to minor violations with equal sample sizes.
🧷 Board pearl: Severely skewed data or unequal variances → consider non-parametric alternatives.
Solid White Background
Effect Size and Clinical Significance
📍 Statistical significance (p < 0.05) doesn't equal clinical importance — with large samples, tiny differences can be "significant."
📍 Cohen's d = (mean₁ − mean₂) ÷ pooled standard deviation.
📍 Small effect: d = 0.2; Medium: d = 0.5; Large: d = 0.8.
📍 For ANOVA, eta-squared (η²) = between-group variability ÷ total variability.
📍 Confidence intervals provide more information than p-values alone.
📍 Board principle: A statistically significant result with a trivial effect size may have no clinical relevance.
Solid White Background
Sample Size and Power
🔹 Power = probability of detecting a true difference when it exists (1 − β, where β = Type II error rate).
🔹 Standard power target is 80%, meaning 20% chance of missing a true effect.
🔹 Factors increasing power: larger sample size, larger effect size, higher α level, lower variability, paired/repeated designs.
🔹 Sample size calculations require: expected effect size, desired power, significance level, and estimated variability.
🔹 Board pearl: Insufficient sample size → underpowered study → may fail to detect clinically important differences.
Solid White Background
Type I and Type II Errors in Context
Type I error (α): rejecting a true null hypothesis — claiming a difference exists when it doesn't.
Type II error (β): failing to reject a false null hypothesis — missing a real difference.
Multiple comparisons increase Type I error risk: with 20 tests at α = 0.05, expect 1 false positive by chance.
Trade-off: lowering α (e.g., 0.01) reduces Type I error but increases Type II error.
Board scenario: A study with p = 0.06 and small sample size likely represents Type II error (underpowered), not proof of no effect.
Solid White Background
Non-Parametric Alternatives
When parametric assumptions are violated, non-parametric tests analyze ranks rather than raw values.
Mann-Whitney U test: non-parametric alternative to independent t-test.
Wilcoxon signed-rank test: non-parametric alternative to paired t-test.
Kruskal-Wallis test: non-parametric alternative to one-way ANOVA.
Friedman test: non-parametric alternative to repeated measures ANOVA.
Board pearl: Ordinal data (e.g., pain scales 1-10) or obviously skewed data → use non-parametric test.
Solid White Background
Common Biostatistical Pitfalls
🧠 Multiple t-tests instead of ANOVA: inflates Type I error when comparing ≥3 groups.
🧠 Using independent t-test for paired data: ignores correlation, reduces power.
🧠 Ignoring assumptions: applying parametric tests to severely skewed or ordinal data.
🧠 Confusing statistical and clinical significance: tiny p-values don't guarantee meaningful effects.
🧠 Post-hoc data dredging: finding "significant" results by testing many hypotheses without correction.
🧠 Board warning: If comparing 4 groups with 6 separate t-tests → incorrect approach, use ANOVA.
Solid White Background
Choosing the Correct Test
Two groups + continuous outcome + independent samples → independent t-test.
Two groups + continuous outcome + paired samples → paired t-test.
≥3 groups + continuous outcome + independent samples → one-way ANOVA.
≥3 time points + continuous outcome + same subjects → repeated measures ANOVA.
Two factors + continuous outcome → two-way ANOVA.
Violated assumptions or ordinal data → non-parametric alternative.
Board strategy: Identify number of groups, independence of observations, and data type to select the test.
Solid White Background
Interpreting Results in Research Papers
📌 Check the p-value against the predetermined α level (usually 0.05).
📌 Look for confidence intervals — they provide both significance and effect magnitude.
📌 Verify appropriate test selection based on study design and data characteristics.
📌 Consider clinical relevance beyond statistical significance.
📌 Watch for multiple comparison corrections in studies with many outcomes.
📌 Board skill: Given a results table, identify whether the correct statistical test was used based on the study description.
Solid White Background
Board Question Stem Patterns
📣 Comparing mean blood pressures between drug and placebo groups (different patients) → independent t-test.
📣 Comparing pre- and post-treatment weights in the same patients → paired t-test.
📣 Comparing mean healing times among 4 different wound dressings → one-way ANOVA.
📣 Study reports p = 0.15 with n = 10 per group → likely underpowered (Type II error).
📣 Comparing 5 groups with 10 separate t-tests → inappropriate, should use ANOVA with post-hoc tests.
📣 Severely right-skewed outcome data → non-parametric test needed.
📣 Both drug type and dosing schedule affecting outcome → two-way ANOVA.
Solid White Background
One-Line Recap
🔸 t-tests compare means between 2 groups (independent for different subjects, paired for same subjects), while ANOVA extends this to ≥3 groups by partitioning variance into between-group and within-group components, with both requiring normal distributions and equal variances, using post-hoc corrections for multiple comparisons, and having non-parametric alternatives when assumptions are violated.
Solid White Background
bottom of page