Examples of Post-hoc Tests After ANOVA: Practical Examples
Before definitions, let’s anchor this in reality. Here are several examples of post-hoc tests after ANOVA: practical examples you’re likely to see in published research or applied work:
- A clinical trial comparing three blood-pressure drugs
- A school district comparing test scores across four teaching methods
- A marketing team testing five versions of an email subject line
- A sports scientist comparing recovery times under different training programs
- A nutrition researcher comparing weight loss across four diets
- A UX team comparing task completion times across three app designs
All of these start with a one-way ANOVA (or repeated-measures ANOVA). Once the ANOVA p-value says, “At least one group is different,” you need post-hoc tests to pinpoint which groups differ while controlling the familywise error rate.
The best examples of post-hoc tests after ANOVA usually involve:
- Multiple groups (3+)
- Interest in all pairwise comparisons, not just one or two
- A need to control for false positives when you run many tests
Let’s walk through specific, data-driven scenarios and see how different post-hoc tests behave.
Clinical trial: Tukey’s HSD as a classic example of post-hoc tests after ANOVA
Imagine a randomized clinical trial comparing three antihypertensive drugs and a placebo:
- Placebo
- Drug A
- Drug B
- Drug C
Outcome: change in systolic blood pressure (mmHg) after 8 weeks.
The researcher runs a one-way ANOVA and finds:
- F(3, 196) = 9.8, p < 0.001
So, there’s a statistically significant difference among the four groups. But which drugs outperform placebo? And is Drug C better than A or B?
Here’s where one of the most common examples of post-hoc tests after ANOVA: practical examples comes in: Tukey’s Honest Significant Difference (HSD).
In this scenario, Tukey’s HSD is a strong choice because:
- Group sizes are equal (n = 50 per group)
- Variances appear similar (Levene’s test p > 0.05)
- The researcher wants to compare all pairs (A vs B, A vs C, B vs C, each vs placebo)
Suppose Tukey’s HSD results show:
- Drug A vs Placebo: p = 0.040
- Drug B vs Placebo: p = 0.001
- Drug C vs Placebo: p = 0.120
- Drug B vs Drug A: p = 0.030
- Drug C vs Drug A: p = 0.900
- Drug C vs Drug B: p = 0.060
Interpretation in plain English:
- Drug B clearly outperforms placebo and is better than Drug A.
- Drug A is modestly better than placebo.
- Drug C is statistically indistinguishable from placebo and from the other drugs.
This is a textbook example of how Tukey’s HSD gives a balanced view of all pairwise differences while controlling the overall Type I error rate. If you read NIH-funded clinical studies, you’ll often see Tukey’s HSD or Bonferroni used in exactly this kind of multi-arm trial (see, for instance, methodological discussions at NIH).
Education study: Bonferroni as a conservative example of post-hoc tests after ANOVA
Now picture a large school district testing four reading interventions for 4th graders:
- Traditional textbook
- Guided reading
- Phonics-intensive program
- Digital reading app
Outcome: standardized reading score at the end of the year.
ANOVA output:
- F(3, 796) = 5.3, p = 0.0013
The district is risk‑averse; they don’t want to overclaim that a method is better when it’s just noise. They care most about comparisons vs the traditional method, not every possible pair.
Here, Bonferroni-corrected pairwise t‑tests are a good fit. This is another widely used example of post-hoc tests after ANOVA: practical examples because Bonferroni is simple, transparent, and easy to justify to non‑statisticians.
Let’s say there are three comparisons of interest:
- Guided vs Traditional
- Phonics vs Traditional
- Digital vs Traditional
Without adjustment, p-values might be:
- Guided vs Traditional: p = 0.020
- Phonics vs Traditional: p < 0.001
- Digital vs Traditional: p = 0.045
With Bonferroni (α = 0.05 / 3 ≈ 0.0167), only p-values below 0.0167 are considered significant:
- Guided vs Traditional: 0.020 → not significant after correction
- Phonics vs Traditional: < 0.001 → still significant
- Digital vs Traditional: 0.045 → not significant
So the district concludes that only the phonics-intensive program clearly outperforms the traditional method. This conservative stance is often recommended in educational and social science research, and you’ll see it in many university statistics tutorials, such as those hosted by UCLA’s IDRE.
Marketing A/B/n test: Games–Howell as a modern example when variances differ
In 2024–2025, marketing teams routinely run A/B/n tests with unequal group sizes and messy data. Suppose a company tests five versions of an email subject line:
- Control
- Version 1
- Version 2
- Version 3
- Version 4
Outcome: click‑through rate (CTR), a proportion that can have different variances across groups.
The ANOVA (or a generalized linear model with a similar idea) suggests a difference:
- F(4, 20,000) = 15.2, p < 0.001
However, Levene’s test for homogeneity of variance is significant (p < 0.001), and group sizes are very unbalanced (Control: 8,000; Version 1: 3,000; Version 2: 2,000; Version 3: 4,000; Version 4: 3,000).
This is where Games–Howell shines. As one of the best examples of post-hoc tests after ANOVA under unequal variances and unequal sample sizes, Games–Howell avoids assuming equal variances and uses adjusted degrees of freedom.
Suppose Games–Howell finds that:
- Version 2 and Version 3 both significantly outperform Control
- Version 1 and Version 4 are not significantly different from Control
- Version 3 outperforms Version 2 by a small but statistically significant margin
In practice, the marketing team might:
- Roll out Version 3 broadly
- Keep Version 2 as a backup
- Drop Version 1 and Version 4 from future campaigns
This example of Games–Howell is increasingly common as companies lean on R and Python libraries that implement it directly. It’s a nice reminder that not all post-hoc tests assume equal variances, which matters with real‑world click data, medical costs, or any skewed metric.
Sports science: Scheffé test for flexible contrasts
Consider a sports science lab comparing four training programs on 5K race time:
- High‑intensity interval training (HIIT)
- Tempo runs
- Long slow distance (LSD)
- Mixed program (combination)
Outcome: 5K time in minutes after 12 weeks.
ANOVA result:
- F(3, 116) = 4.9, p = 0.003
The researchers are not just interested in pairwise differences; they have theory‑driven hypotheses, such as:
- The average of HIIT and Tempo vs the average of LSD and Mixed
- HIIT vs the average of all others
Here, Scheffé’s test is a workhorse. It’s more conservative for pairwise comparisons but very flexible for any linear contrast of group means. This flexibility makes it a classic example of post-hoc tests after ANOVA: practical examples in fields where custom contrasts matter.
A typical interpretation might be:
- No single program dominates all others pairwise after Scheffé correction
- But the contrast “(HIIT + Tempo) / 2 vs (LSD + Mixed) / 2” is significant, suggesting that more intense training tends to outperform lower‑intensity approaches
This kind of result often shows up in exercise physiology research and is discussed in many graduate‑level statistics courses (see, for instance, course notes from Harvard University).
Nutrition and weight loss: Dunnett’s test when one group is the reference
Now let’s move to a nutrition example: a weight‑loss study with one control group and several diet programs:
- Control (standard diet advice)
- Low‑carb diet
- Mediterranean diet
- Low‑fat diet
Outcome: weight change (lbs) after 6 months.
The primary scientific question: which diets outperform the control? The researchers are not especially interested in comparing low‑carb vs Mediterranean vs low‑fat among themselves.
After ANOVA shows a significant overall effect, this is a strong example of using Dunnett’s test, which is designed specifically for comparing multiple treatments to a single control while controlling the familywise error rate.
Dunnett’s test might show:
- Low‑carb vs Control: p = 0.010
- Mediterranean vs Control: p = 0.002
- Low‑fat vs Control: p = 0.090
Conclusion:
- Both low‑carb and Mediterranean diets lead to more weight loss than standard advice
- The low‑fat diet is not reliably better than control in this sample
You’ll see Dunnett’s test in pharmacology, toxicology, and nutrition, especially when there’s a single reference condition (often a placebo or standard of care). Methodological discussions of such designs frequently appear in NIH and FDA‑related materials.
UX and product design: repeated-measures ANOVA with post-hoc tests
Post-hoc testing is not limited to independent groups. Consider a UX study where the same participants use three versions of an app interface:
- Version A (baseline)
- Version B (new navigation)
- Version C (new navigation + color scheme)
Outcome: task completion time (seconds) for a standardized set of tasks.
Because each participant uses all three versions, you run a repeated-measures ANOVA. The result:
- F(2, 98) = 14.1, p < 0.001
To find out which versions differ, you can use pairwise comparisons with Bonferroni or Holm correction on the repeated-measures means. This is another modern example of post-hoc tests after ANOVA: practical examples, especially in human‑computer interaction and psychology.
Suppose Holm‑corrected results show:
- B vs A: p = 0.004 (faster)
- C vs A: p < 0.001 (faster)
- C vs B: p = 0.030 (C still faster than B)
The product team can then justify moving to Version C, backed by statistically supported performance gains.
2024–2025 trends: how people are actually running these post-hoc tests
In 2024–2025, the way people run these examples of post-hoc tests after ANOVA: practical examples has shifted in a few noticeable ways:
- R and Python dominate: Packages like
emmeansandmultcompin R, andstatsmodelsin Python, make it easy to specify complex models and then request Tukey, Bonferroni, Holm, or Scheffé post-hoc tests in one line of code. - Effect sizes and confidence intervals: Researchers increasingly report Cohen’s d or Hedges’ g for pairwise differences, plus 95% CIs, not just p-values. Many journals now expect this.
- Multiple-testing awareness: There is more attention to methods like Holm, Hochberg, and Benjamini–Hochberg (FDR control), especially when the number of comparisons is large.
- Assumption checks are standard: Checking normality, variance homogeneity, and outliers before choosing a post-hoc test is now standard advice in university and government training materials (for example, statistical guidance linked from CDC or NIH).
These trends don’t change the basic logic—ANOVA first, then post-hoc—but they do shape how the best examples of post-hoc tests after ANOVA are implemented and reported today.
Choosing among the main post-hoc tests: practical guidance
Given all these examples of post-hoc tests after ANOVA: practical examples, a natural question is: how do you decide which test to use in your own study?
A practical, data‑driven way to think about it:
Tukey’s HSD
- Great for: Equal or nearly equal group sizes, interest in all pairwise comparisons
- Example: The blood-pressure drug trial
Bonferroni / Holm
- Great for: A small number of preplanned comparisons, conservative control of Type I error
- Example: Education study comparing each method to traditional teaching
Games–Howell
- Great for: Unequal variances and unequal group sizes
- Example: Marketing A/B/n test with different email list sizes
Scheffé
- Great for: Complex contrasts, not just pairwise comparisons
- Example: Sports science comparing “intense” vs “less intense” programs
Dunnett
- Great for: Multiple treatments vs a single control
- Example: Diet study with one standard‑advice control group
The more your situation looks like one of the real examples above, the more confident you can be that you’re picking a reasonable post-hoc strategy.
FAQ: short answers with real examples
Q1. Can you give simple examples of post-hoc tests after ANOVA in medical research?
Yes. A standard example of a post-hoc test in medical research is Tukey’s HSD or Bonferroni comparisons after an ANOVA in a multi‑arm clinical trial. For instance, comparing three antidepressants and a placebo on depression scores: ANOVA tells you that at least one drug differs; Tukey or Bonferroni tells you exactly which drugs outperform placebo and whether any drug is better than the others.
Q2. What are some of the best examples of post-hoc tests after ANOVA for unequal variances?
Games–Howell is often highlighted as one of the best examples for unequal variances and unequal sample sizes. In a marketing context, if different ad campaigns have very different numbers of impressions and highly variable click‑through rates, Games–Howell will usually be safer than Tukey’s HSD.
Q3. When should I use Dunnett’s test instead of Tukey’s HSD?
Use Dunnett when you have one control group and several treatments, and your main interest is comparing each treatment to that control. This is common in toxicology, pharmacology, and nutrition. If you care about all pairwise comparisons (treatment vs treatment), then Tukey’s HSD or another all‑pairs method is more appropriate.
Q4. Are post-hoc tests always required after ANOVA?
No. If you only had two groups, ANOVA and a t‑test are equivalent, so no post-hoc test is needed. Also, if you had a small number of preplanned contrasts (for example, Treatment vs Control only), you might skip generic post-hoc tests and instead run those planned comparisons with an appropriate correction.
Q5. Where can I learn more about these tests with worked examples?
University statistics centers and government resources are good starting points. For example, UCLA’s statistical consulting pages, Harvard’s statistics course materials, and federal research guidance from NIH or CDC often include ANOVA and post-hoc examples with real data.
Post-hoc tests are not just a technical afterthought—they’re how you turn a vague “something is different” from ANOVA into specific, defensible statements about your groups. By grounding your choice of test in examples of post-hoc tests after ANOVA: practical examples like the ones above, you’ll be much better equipped to explain and defend your results to reviewers, managers, or anyone else who cares about what your data actually say.
Related Topics
Real-world examples of ANOVA examples in clinical trials
Practical examples of ANOVA in educational research
Best examples of factorial ANOVA examples: practical applications in real research
The best examples of 3 ANOVA examples in market research
Examples of Post-hoc Tests After ANOVA: Practical Examples
Explore More ANOVA Examples
Discover more examples and insights in this category.
View All ANOVA Examples