Biostatistics & Epidemiology

t-tests (independent, paired), ANOVA

Core Principle of t-tests and ANOVA

🧷 t-tests and ANOVA are parametric statistical tests that compare means between groups to determine if observed differences are statistically significant or due to random chance.

🧷 Both tests assume data follows a normal distribution, observations are independent, and variances are roughly equal between groups (homoscedasticity).

🧷 The fundamental question: Is the variability between group means larger than expected from the variability within groups?

🧷 t-tests compare exactly 2 groups; ANOVA compares 3 or more groups simultaneously.

🧷 Board pearl: If comparing means of 2 groups → t-test. If comparing means of ≥3 groups → ANOVA.

The t-statistic: Signal-to-Noise Ratio

📍 t = (difference between group means) ÷ (standard error of the difference).

📍 Conceptually, t represents how many standard errors separate the two group means — larger t values indicate greater separation.

📍 The denominator incorporates both sample variability (pooled standard deviation) and sample size.

📍 Degrees of freedom (df) = n₁ + n₂ − 2 for independent samples, where n₁ and n₂ are the two sample sizes.

📍 Critical t-values depend on df and desired significance level (α, typically 0.05).

📍 Board pearl: Larger sample sizes → smaller standard error → larger t-statistic → more likely to detect true differences.

Independent (Unpaired) t-test

🔹 Compares means between two independent groups with no pairing or matching between observations.

🔹 Classic example: Comparing mean blood pressure between patients receiving drug A versus placebo, where different patients are in each group.

🔹 Assumes both groups are sampled from populations with equal variances (can be tested with Levene's test).

🔹 If variances are unequal, use Welch's t-test modification which adjusts the degrees of freedom.

🔹 Board distinction: Independent = different subjects in each group. No subject appears in both groups.

Paired (Dependent) t-test

⭐ Compares means between two related measurements on the same subjects or matched pairs.

⭐ Classic examples: Before-and-after treatment measurements on the same patients, left eye vs right eye measurements, twin studies.

⭐ Analyzes the mean of the differences between paired observations, not the difference between means.

⭐ More powerful than independent t-test because it controls for between-subject variability.

⭐ df = n − 1, where n is the number of pairs.

⭐ Board pearl: If the question mentions "same patients measured twice" or "matched pairs" → paired t-test.

One-Sample t-test

✅ Compares a sample mean to a known population mean or hypothesized value.

✅ Example: Testing if mean birth weight in a hospital (sample) differs from the national average of 3300g (population value).

✅ t = (sample mean − hypothesized mean) ÷ (standard error of the mean).

✅ Standard error = sample standard deviation ÷ √n.

✅ df = n − 1.

✅ Less commonly tested on boards but appears when comparing observed data to established norms or reference values.

ANOVA: Analysis of Variance Fundamentals

🧠 ANOVA tests the null hypothesis that all group means are equal: H₀: μ₁ = μ₂ = μ₃ = ... = μₖ.

🧠 Despite its name focusing on "variance," ANOVA is fundamentally about comparing means.

🧠 The test partitions total variability into between-group variability (signal) and within-group variability (noise).

🧠 F-statistic = (between-group variance) ÷ (within-group variance).

🧠 Large F values suggest group differences; F ≈ 1 suggests no difference.

🧠 Board pearl: ANOVA tells you if at least one group differs but doesn't identify which specific groups differ.

One-Way ANOVA

⚡ Compares means across ≥3 groups for a single independent variable (factor).

⚡ Example: Comparing mean cholesterol levels among patients on diet alone vs statin vs diet + statin (3 groups, 1 factor).

⚡ Between-group df = k − 1 (k = number of groups).

⚡ Within-group df = N − k (N = total sample size).

⚡ Results in an F-statistic with its associated p-value.

⚡ If significant, requires post-hoc testing to determine which specific group pairs differ.

Post-Hoc Testing After ANOVA

📌 A significant ANOVA indicates at least one group differs, but doesn't specify which groups.

📌 Post-hoc tests perform pairwise comparisons while controlling for multiple comparison inflation of Type I error.

📌 Common methods: Tukey's HSD (honestly significant difference), Bonferroni correction, Scheffé test.

📌 Bonferroni: divides α by the number of comparisons; most conservative.

📌 Tukey's HSD: balances Type I and II error; most commonly used.

📌 Board pearl: Without post-hoc correction, performing multiple t-tests inflates the chance of finding false positives.

Two-Way ANOVA

📣 Analyzes the effect of two independent variables (factors) on a continuous outcome.

📣 Example: Examining how both drug type (A vs B) and dosing frequency (daily vs BID) affect blood pressure.

📣 Tests three hypotheses: main effect of factor 1, main effect of factor 2, and interaction between factors.

📣 Interaction means the effect of one factor depends on the level of the other factor.

📣 Board clue: If a question mentions examining two different categorical variables' effects on an outcome → two-way ANOVA.

Repeated Measures ANOVA

🔸 Extension of paired t-test to ≥3 time points or conditions on the same subjects.

🔸 Example: Measuring pain scores at baseline, 1 hour, 2 hours, and 4 hours after analgesia administration.

🔸 Accounts for correlation between repeated measurements on the same subject.

🔸 More powerful than independent groups ANOVA because it controls for between-subject variability.

🔸 Assumes sphericity (equal variances of differences between all pairs of conditions).

🔸 Board distinction: Same subjects measured multiple times → repeated measures ANOVA, not one-way ANOVA.

Assumptions and Violations

🧷 Normality: data within each group should be approximately normally distributed. Violated with skewed data or outliers.

🧷 Independence: observations must be independent. Violated with clustered data or repeated measures (unless using appropriate test).

🧷 Homogeneity of variance: equal variances across groups. Violated when one group has much more variability.

🧷 For t-tests, Central Limit Theorem helps with normality assumption if n ≥ 30 per group.

🧷 ANOVA is fairly robust to minor violations with equal sample sizes.

🧷 Board pearl: Severely skewed data or unequal variances → consider non-parametric alternatives.

Effect Size and Clinical Significance

📍 Statistical significance (p < 0.05) doesn't equal clinical importance — with large samples, tiny differences can be "significant."

📍 Cohen's d = (mean₁ − mean₂) ÷ pooled standard deviation.

📍 Small effect: d = 0.2; Medium: d = 0.5; Large: d = 0.8.

📍 For ANOVA, eta-squared (η²) = between-group variability ÷ total variability.

📍 Confidence intervals provide more information than p-values alone.

📍 Board principle: A statistically significant result with a trivial effect size may have no clinical relevance.

Sample Size and Power

🔹 Power = probability of detecting a true difference when it exists (1 − β, where β = Type II error rate).

🔹 Standard power target is 80%, meaning 20% chance of missing a true effect.

🔹 Factors increasing power: larger sample size, larger effect size, higher α level, lower variability, paired/repeated designs.

🔹 Sample size calculations require: expected effect size, desired power, significance level, and estimated variability.

🔹 Board pearl: Insufficient sample size → underpowered study → may fail to detect clinically important differences.

Type I and Type II Errors in Context

⭐ Type I error (α): rejecting a true null hypothesis — claiming a difference exists when it doesn't.

⭐ Type II error (β): failing to reject a false null hypothesis — missing a real difference.

⭐ Multiple comparisons increase Type I error risk: with 20 tests at α = 0.05, expect 1 false positive by chance.

⭐ Trade-off: lowering α (e.g., 0.01) reduces Type I error but increases Type II error.

⭐ Board scenario: A study with p = 0.06 and small sample size likely represents Type II error (underpowered), not proof of no effect.

Non-Parametric Alternatives

✅ When parametric assumptions are violated, non-parametric tests analyze ranks rather than raw values.

✅ Mann-Whitney U test: non-parametric alternative to independent t-test.

✅ Wilcoxon signed-rank test: non-parametric alternative to paired t-test.

✅ Kruskal-Wallis test: non-parametric alternative to one-way ANOVA.

✅ Friedman test: non-parametric alternative to repeated measures ANOVA.

✅ Board pearl: Ordinal data (e.g., pain scales 1-10) or obviously skewed data → use non-parametric test.

Common Biostatistical Pitfalls

🧠 Multiple t-tests instead of ANOVA: inflates Type I error when comparing ≥3 groups.

🧠 Using independent t-test for paired data: ignores correlation, reduces power.

🧠 Ignoring assumptions: applying parametric tests to severely skewed or ordinal data.

🧠 Confusing statistical and clinical significance: tiny p-values don't guarantee meaningful effects.

🧠 Post-hoc data dredging: finding "significant" results by testing many hypotheses without correction.

🧠 Board warning: If comparing 4 groups with 6 separate t-tests → incorrect approach, use ANOVA.

Choosing the Correct Test

⚡ Two groups + continuous outcome + independent samples → independent t-test.

⚡ Two groups + continuous outcome + paired samples → paired t-test.

⚡ ≥3 groups + continuous outcome + independent samples → one-way ANOVA.

⚡ ≥3 time points + continuous outcome + same subjects → repeated measures ANOVA.

⚡ Two factors + continuous outcome → two-way ANOVA.

⚡ Violated assumptions or ordinal data → non-parametric alternative.

⚡ Board strategy: Identify number of groups, independence of observations, and data type to select the test.

Interpreting Results in Research Papers

📌 Check the p-value against the predetermined α level (usually 0.05).

📌 Look for confidence intervals — they provide both significance and effect magnitude.

📌 Verify appropriate test selection based on study design and data characteristics.

📌 Consider clinical relevance beyond statistical significance.

📌 Watch for multiple comparison corrections in studies with many outcomes.

📌 Board skill: Given a results table, identify whether the correct statistical test was used based on the study description.

Board Question Stem Patterns

📣 Comparing mean blood pressures between drug and placebo groups (different patients) → independent t-test.

📣 Comparing pre- and post-treatment weights in the same patients → paired t-test.

📣 Comparing mean healing times among 4 different wound dressings → one-way ANOVA.

📣 Study reports p = 0.15 with n = 10 per group → likely underpowered (Type II error).

📣 Comparing 5 groups with 10 separate t-tests → inappropriate, should use ANOVA with post-hoc tests.

📣 Severely right-skewed outcome data → non-parametric test needed.

📣 Both drug type and dosing schedule affecting outcome → two-way ANOVA.

One-Line Recap

🔸 t-tests compare means between 2 groups (independent for different subjects, paired for same subjects), while ANOVA extends this to ≥3 groups by partitioning variance into between-group and within-group components, with both requiring normal distributions and equal variances, using post-hoc corrections for multiple comparisons, and having non-parametric alternatives when assumptions are violated.

eduo

visual

Eduovisual

Questions

Eduovisual

Biostatistics & Epidemiology

eduovisual

Products

Exams

Company