Examples of Post-hoc Tests After ANOVA: Practical Examples

If you’ve ever run a one-way ANOVA and stared at the output thinking, “Okay, I know there’s a difference somewhere… but where exactly?”, you’re in the right place. This guide focuses on **examples of post-hoc tests after ANOVA: practical examples** you can actually recognize from real research and day‑to‑day data work. Instead of abstract definitions, we’ll walk through realistic scenarios—clinical trials, education studies, marketing experiments, and more—so you can see how post-hoc tests are chosen and interpreted in context. You’ll see how an **example of** Tukey’s HSD differs from a Bonferroni comparison, when Games–Howell is safer than standard methods, and how modern workflows in R, Python, and SPSS handle these steps. Along the way, we’ll connect to authoritative sources such as the National Institutes of Health (NIH) and major universities, and highlight trends researchers are using in 2024–2025. By the end, you’ll not only recognize the **best examples** of post-hoc tests after ANOVA, you’ll know which one fits your own data story.
Written by
Jamie
Published

Before definitions, let’s anchor this in reality. Here are several examples of post-hoc tests after ANOVA: practical examples you’re likely to see in published research or applied work:

  • A clinical trial comparing three blood-pressure drugs
  • A school district comparing test scores across four teaching methods
  • A marketing team testing five versions of an email subject line
  • A sports scientist comparing recovery times under different training programs
  • A nutrition researcher comparing weight loss across four diets
  • A UX team comparing task completion times across three app designs

All of these start with a one-way ANOVA (or repeated-measures ANOVA). Once the ANOVA p-value says, “At least one group is different,” you need post-hoc tests to pinpoint which groups differ while controlling the familywise error rate.

The best examples of post-hoc tests after ANOVA usually involve:

  • Multiple groups (3+)
  • Interest in all pairwise comparisons, not just one or two
  • A need to control for false positives when you run many tests

Let’s walk through specific, data-driven scenarios and see how different post-hoc tests behave.


Clinical trial: Tukey’s HSD as a classic example of post-hoc tests after ANOVA

Imagine a randomized clinical trial comparing three antihypertensive drugs and a placebo:

  • Placebo
  • Drug A
  • Drug B
  • Drug C

Outcome: change in systolic blood pressure (mmHg) after 8 weeks.

The researcher runs a one-way ANOVA and finds:

  • F(3, 196) = 9.8, p < 0.001

So, there’s a statistically significant difference among the four groups. But which drugs outperform placebo? And is Drug C better than A or B?

Here’s where one of the most common examples of post-hoc tests after ANOVA: practical examples comes in: Tukey’s Honest Significant Difference (HSD).

In this scenario, Tukey’s HSD is a strong choice because:

  • Group sizes are equal (n = 50 per group)
  • Variances appear similar (Levene’s test p > 0.05)
  • The researcher wants to compare all pairs (A vs B, A vs C, B vs C, each vs placebo)

Suppose Tukey’s HSD results show:

  • Drug A vs Placebo: p = 0.040
  • Drug B vs Placebo: p = 0.001
  • Drug C vs Placebo: p = 0.120
  • Drug B vs Drug A: p = 0.030
  • Drug C vs Drug A: p = 0.900
  • Drug C vs Drug B: p = 0.060

Interpretation in plain English:

  • Drug B clearly outperforms placebo and is better than Drug A.
  • Drug A is modestly better than placebo.
  • Drug C is statistically indistinguishable from placebo and from the other drugs.

This is a textbook example of how Tukey’s HSD gives a balanced view of all pairwise differences while controlling the overall Type I error rate. If you read NIH-funded clinical studies, you’ll often see Tukey’s HSD or Bonferroni used in exactly this kind of multi-arm trial (see, for instance, methodological discussions at NIH).


Education study: Bonferroni as a conservative example of post-hoc tests after ANOVA

Now picture a large school district testing four reading interventions for 4th graders:

  • Traditional textbook
  • Guided reading
  • Phonics-intensive program
  • Digital reading app

Outcome: standardized reading score at the end of the year.

ANOVA output:

  • F(3, 796) = 5.3, p = 0.0013

The district is risk‑averse; they don’t want to overclaim that a method is better when it’s just noise. They care most about comparisons vs the traditional method, not every possible pair.

Here, Bonferroni-corrected pairwise t‑tests are a good fit. This is another widely used example of post-hoc tests after ANOVA: practical examples because Bonferroni is simple, transparent, and easy to justify to non‑statisticians.

Let’s say there are three comparisons of interest:

  • Guided vs Traditional
  • Phonics vs Traditional
  • Digital vs Traditional

Without adjustment, p-values might be:

  • Guided vs Traditional: p = 0.020
  • Phonics vs Traditional: p < 0.001
  • Digital vs Traditional: p = 0.045

With Bonferroni (α = 0.05 / 3 ≈ 0.0167), only p-values below 0.0167 are considered significant:

  • Guided vs Traditional: 0.020 → not significant after correction
  • Phonics vs Traditional: < 0.001 → still significant
  • Digital vs Traditional: 0.045 → not significant

So the district concludes that only the phonics-intensive program clearly outperforms the traditional method. This conservative stance is often recommended in educational and social science research, and you’ll see it in many university statistics tutorials, such as those hosted by UCLA’s IDRE.


Marketing A/B/n test: Games–Howell as a modern example when variances differ

In 2024–2025, marketing teams routinely run A/B/n tests with unequal group sizes and messy data. Suppose a company tests five versions of an email subject line:

  • Control
  • Version 1
  • Version 2
  • Version 3
  • Version 4

Outcome: click‑through rate (CTR), a proportion that can have different variances across groups.

The ANOVA (or a generalized linear model with a similar idea) suggests a difference:

  • F(4, 20,000) = 15.2, p < 0.001

However, Levene’s test for homogeneity of variance is significant (p < 0.001), and group sizes are very unbalanced (Control: 8,000; Version 1: 3,000; Version 2: 2,000; Version 3: 4,000; Version 4: 3,000).

This is where Games–Howell shines. As one of the best examples of post-hoc tests after ANOVA under unequal variances and unequal sample sizes, Games–Howell avoids assuming equal variances and uses adjusted degrees of freedom.

Suppose Games–Howell finds that:

  • Version 2 and Version 3 both significantly outperform Control
  • Version 1 and Version 4 are not significantly different from Control
  • Version 3 outperforms Version 2 by a small but statistically significant margin

In practice, the marketing team might:

  • Roll out Version 3 broadly
  • Keep Version 2 as a backup
  • Drop Version 1 and Version 4 from future campaigns

This example of Games–Howell is increasingly common as companies lean on R and Python libraries that implement it directly. It’s a nice reminder that not all post-hoc tests assume equal variances, which matters with real‑world click data, medical costs, or any skewed metric.


Sports science: Scheffé test for flexible contrasts

Consider a sports science lab comparing four training programs on 5K race time:

  • High‑intensity interval training (HIIT)
  • Tempo runs
  • Long slow distance (LSD)
  • Mixed program (combination)

Outcome: 5K time in minutes after 12 weeks.

ANOVA result:

  • F(3, 116) = 4.9, p = 0.003

The researchers are not just interested in pairwise differences; they have theory‑driven hypotheses, such as:

  • The average of HIIT and Tempo vs the average of LSD and Mixed
  • HIIT vs the average of all others

Here, Scheffé’s test is a workhorse. It’s more conservative for pairwise comparisons but very flexible for any linear contrast of group means. This flexibility makes it a classic example of post-hoc tests after ANOVA: practical examples in fields where custom contrasts matter.

A typical interpretation might be:

  • No single program dominates all others pairwise after Scheffé correction
  • But the contrast “(HIIT + Tempo) / 2 vs (LSD + Mixed) / 2” is significant, suggesting that more intense training tends to outperform lower‑intensity approaches

This kind of result often shows up in exercise physiology research and is discussed in many graduate‑level statistics courses (see, for instance, course notes from Harvard University).


Nutrition and weight loss: Dunnett’s test when one group is the reference

Now let’s move to a nutrition example: a weight‑loss study with one control group and several diet programs:

  • Control (standard diet advice)
  • Low‑carb diet
  • Mediterranean diet
  • Low‑fat diet

Outcome: weight change (lbs) after 6 months.

The primary scientific question: which diets outperform the control? The researchers are not especially interested in comparing low‑carb vs Mediterranean vs low‑fat among themselves.

After ANOVA shows a significant overall effect, this is a strong example of using Dunnett’s test, which is designed specifically for comparing multiple treatments to a single control while controlling the familywise error rate.

Dunnett’s test might show:

  • Low‑carb vs Control: p = 0.010
  • Mediterranean vs Control: p = 0.002
  • Low‑fat vs Control: p = 0.090

Conclusion:

  • Both low‑carb and Mediterranean diets lead to more weight loss than standard advice
  • The low‑fat diet is not reliably better than control in this sample

You’ll see Dunnett’s test in pharmacology, toxicology, and nutrition, especially when there’s a single reference condition (often a placebo or standard of care). Methodological discussions of such designs frequently appear in NIH and FDA‑related materials.


UX and product design: repeated-measures ANOVA with post-hoc tests

Post-hoc testing is not limited to independent groups. Consider a UX study where the same participants use three versions of an app interface:

  • Version A (baseline)
  • Version B (new navigation)
  • Version C (new navigation + color scheme)

Outcome: task completion time (seconds) for a standardized set of tasks.

Because each participant uses all three versions, you run a repeated-measures ANOVA. The result:

  • F(2, 98) = 14.1, p < 0.001

To find out which versions differ, you can use pairwise comparisons with Bonferroni or Holm correction on the repeated-measures means. This is another modern example of post-hoc tests after ANOVA: practical examples, especially in human‑computer interaction and psychology.

Suppose Holm‑corrected results show:

  • B vs A: p = 0.004 (faster)
  • C vs A: p < 0.001 (faster)
  • C vs B: p = 0.030 (C still faster than B)

The product team can then justify moving to Version C, backed by statistically supported performance gains.


2024–2025 trends: how people are actually running these post-hoc tests

In 2024–2025, the way people run these examples of post-hoc tests after ANOVA: practical examples has shifted in a few noticeable ways:

  • R and Python dominate: Packages like emmeans and multcomp in R, and statsmodels in Python, make it easy to specify complex models and then request Tukey, Bonferroni, Holm, or Scheffé post-hoc tests in one line of code.
  • Effect sizes and confidence intervals: Researchers increasingly report Cohen’s d or Hedges’ g for pairwise differences, plus 95% CIs, not just p-values. Many journals now expect this.
  • Multiple-testing awareness: There is more attention to methods like Holm, Hochberg, and Benjamini–Hochberg (FDR control), especially when the number of comparisons is large.
  • Assumption checks are standard: Checking normality, variance homogeneity, and outliers before choosing a post-hoc test is now standard advice in university and government training materials (for example, statistical guidance linked from CDC or NIH).

These trends don’t change the basic logic—ANOVA first, then post-hoc—but they do shape how the best examples of post-hoc tests after ANOVA are implemented and reported today.


Choosing among the main post-hoc tests: practical guidance

Given all these examples of post-hoc tests after ANOVA: practical examples, a natural question is: how do you decide which test to use in your own study?

A practical, data‑driven way to think about it:

  • Tukey’s HSD

    • Great for: Equal or nearly equal group sizes, interest in all pairwise comparisons
    • Example: The blood-pressure drug trial
  • Bonferroni / Holm

    • Great for: A small number of preplanned comparisons, conservative control of Type I error
    • Example: Education study comparing each method to traditional teaching
  • Games–Howell

    • Great for: Unequal variances and unequal group sizes
    • Example: Marketing A/B/n test with different email list sizes
  • Scheffé

    • Great for: Complex contrasts, not just pairwise comparisons
    • Example: Sports science comparing “intense” vs “less intense” programs
  • Dunnett

    • Great for: Multiple treatments vs a single control
    • Example: Diet study with one standard‑advice control group

The more your situation looks like one of the real examples above, the more confident you can be that you’re picking a reasonable post-hoc strategy.


FAQ: short answers with real examples

Q1. Can you give simple examples of post-hoc tests after ANOVA in medical research?
Yes. A standard example of a post-hoc test in medical research is Tukey’s HSD or Bonferroni comparisons after an ANOVA in a multi‑arm clinical trial. For instance, comparing three antidepressants and a placebo on depression scores: ANOVA tells you that at least one drug differs; Tukey or Bonferroni tells you exactly which drugs outperform placebo and whether any drug is better than the others.

Q2. What are some of the best examples of post-hoc tests after ANOVA for unequal variances?
Games–Howell is often highlighted as one of the best examples for unequal variances and unequal sample sizes. In a marketing context, if different ad campaigns have very different numbers of impressions and highly variable click‑through rates, Games–Howell will usually be safer than Tukey’s HSD.

Q3. When should I use Dunnett’s test instead of Tukey’s HSD?
Use Dunnett when you have one control group and several treatments, and your main interest is comparing each treatment to that control. This is common in toxicology, pharmacology, and nutrition. If you care about all pairwise comparisons (treatment vs treatment), then Tukey’s HSD or another all‑pairs method is more appropriate.

Q4. Are post-hoc tests always required after ANOVA?
No. If you only had two groups, ANOVA and a t‑test are equivalent, so no post-hoc test is needed. Also, if you had a small number of preplanned contrasts (for example, Treatment vs Control only), you might skip generic post-hoc tests and instead run those planned comparisons with an appropriate correction.

Q5. Where can I learn more about these tests with worked examples?
University statistics centers and government resources are good starting points. For example, UCLA’s statistical consulting pages, Harvard’s statistics course materials, and federal research guidance from NIH or CDC often include ANOVA and post-hoc examples with real data.


Post-hoc tests are not just a technical afterthought—they’re how you turn a vague “something is different” from ANOVA into specific, defensible statements about your groups. By grounding your choice of test in examples of post-hoc tests after ANOVA: practical examples like the ones above, you’ll be much better equipped to explain and defend your results to reviewers, managers, or anyone else who cares about what your data actually say.

Explore More ANOVA Examples

Discover more examples and insights in this category.

View All ANOVA Examples