Real‑world examples of power analysis for mixed-design ANOVA

If you run experiments with both between‑subjects and within‑subjects factors, you eventually hit the same wall: how many participants do I actually need? That’s where examples of power analysis for mixed-design ANOVA become very handy. Instead of staring at G*Power wondering what to click, it helps to see real examples, with real numbers, and real decisions. In this guide, I walk through multiple examples of power analysis for mixed-design ANOVA examples drawn from psychology, education, medicine, and user‑experience research. You’ll see how researchers specify effect sizes, choose alpha levels, deal with correlations among repeated measures, and translate all of that into sample size targets. Along the way, I’ll point to current practices and recommendations from 2024–2025 papers and methods notes, so the advice isn’t frozen in a 2010 G*Power screenshot. If you’ve ever thought, “I just want a concrete example of how to plan a mixed-design ANOVA study,” this is written for you.
Written by
Jamie
Published

Starting with concrete examples, not theory

Before getting lost in formulas, it helps to see what real examples of power analysis for mixed-design ANOVA examples actually look like in practice. A mixed design (also called split‑plot or repeated‑measures with a between‑subjects factor) shows up whenever you:

  • Compare different groups and
  • Measure each person more than once (time, condition, stimulus type, etc.)

The core idea of power analysis is always the same: given your design, effect size, alpha, and desired power, what sample size do you need to reliably detect an effect? For mixed designs, you add the twist of within‑subject correlations and sometimes sphericity corrections.

Below, I walk through several examples of power analysis for mixed-design ANOVA examples from different fields, then circle back to patterns and practical tips.


Psychology study: CBT vs. control over three time points

Consider a clinical psychology trial comparing cognitive behavioral therapy (CBT) to a wait‑list control on depression scores measured at baseline, 8 weeks, and 16 weeks.

  • Between‑subjects factor: Group (CBT vs. control)
  • Within‑subjects factor: Time (baseline, 8 weeks, 16 weeks)
  • Key effect of interest: Group × Time interaction (does CBT improve faster?)

Researchers want 80% power to detect a medium effect in the interaction. They look at prior meta‑analyses from the National Institute of Mental Health (NIMH) and similar trials reporting standardized mean differences around f = 0.25 for change over time.

How this power analysis might be set up

In a tool like G*Power (or R packages such as pwr2 or longpower):

  • Test family: F tests
  • Statistical test: Repeated measures ANOVA, within–between interaction
  • Effect size f: 0.25 (medium)
  • α (two‑sided): 0.05
  • Power (1 − β): 0.80
  • Groups: 2
  • Measurements: 3
  • Correlation among repeated measures: estimated at 0.6 from pilot data
  • Nonsphericity correction ε: 1 (assuming sphericity; often optimistic)

This setup might suggest about 34 participants per group (68 total) to achieve the desired power for the interaction. Because dropout is expected in clinical trials (often 15–20% by week 16), the team inflates the target to around 80–84 participants.

This is a classic example of power analysis for mixed-design ANOVA: the interaction is the focus, within‑subject correlation matters, and attrition is explicitly folded into the final sample size.


Education research: Three teaching methods over multiple quizzes

Now shift to an education study comparing three teaching methods (lecture, flipped classroom, and online) on statistics quiz performance across four weeks.

  • Between‑subjects factor: Teaching method (3 levels)
  • Within‑subjects factor: Time (4 weekly quizzes)
  • Outcome: Quiz score (0–100)

The researcher is mostly interested in whether the pattern of improvement over time differs across methods (the Method × Time interaction).

Power analysis decisions

The researcher reads recent teaching‑method meta‑analyses from sources like Harvard’s Bok Center and finds typical gains around small‑to‑medium effects. They decide to power the study for an effect size f = 0.20 on the interaction.

Inputs for power analysis:

  • Test: Repeated measures ANOVA, within–between interaction
  • Effect size f: 0.20
  • α: 0.05
  • Desired power: 0.90 (they want higher power because data collection is relatively cheap)
  • Groups: 3
  • Measurements: 4
  • Correlation among repeated measures: 0.5 (based on pilot course data)
  • ε: 0.9 (anticipating some sphericity violation)

With these settings, the power analysis suggests around 45–50 students per teaching method, or roughly 135–150 students total, to hit 90% power for the interaction.

This is one of the best examples of power analysis for mixed-design ANOVA in an applied education context: the researcher explicitly uses realistic correlations, a modest effect size, and a conservative epsilon.


Medical trial: Drug vs. placebo with repeated blood pressure measures

Medical research frequently uses mixed designs: patients are randomized to treatment groups, then followed over time. Suppose a cardiology team is evaluating a new antihypertensive drug vs. placebo.

  • Between‑subjects factor: Treatment (drug vs. placebo)
  • Within‑subjects factor: Time (baseline, 1 month, 3 months, 6 months)
  • Outcome: Systolic blood pressure (mm Hg)

They care about the Group × Time interaction and want to detect a difference in the trajectory of at least 5 mm Hg by 6 months.

Translating a clinical effect into a power analysis

From prior trials and resources like the National Heart, Lung, and Blood Institute (NHLBI), they know the standard deviation of systolic blood pressure is around 12–15 mm Hg in similar populations.

A 5‑mm Hg difference over time corresponds roughly to a standardized effect of about 0.33–0.40, which might translate to an ANOVA effect size of f ≈ 0.18–0.22 for the interaction.

They set up the power analysis with:

  • Effect size f: 0.20
  • α: 0.05
  • Power: 0.80 (typical for clinical work, though some go to 0.90)
  • Groups: 2
  • Measurements: 4
  • Correlation among repeated measures: 0.7 (blood pressure is fairly stable within person)
  • ε: 0.9

The calculation suggests about 60–65 patients per group (120–130 total) for 80% power. Because clinical trials often experience attrition and missing visits, the team plans for 150–160 participants overall.

This is a realistic example of power analysis for mixed-design ANOVA that ties a clinically meaningful difference (5 mm Hg) directly to the effect size used in planning.


UX research: Interface A vs. B with repeated usability tasks

User‑experience and HCI studies increasingly rely on mixed designs. Imagine a tech company comparing two versions of a mobile app interface on task completion time across different tasks.

  • Between‑subjects factor: Interface (A vs. B)
  • Within‑subjects factor: Task (3 core tasks each user completes)
  • Outcome: Task completion time (seconds)

The UX team wants to detect whether one interface scales worse across more complex tasks, again focusing on the Interface × Task interaction.

Planning with limited prior data

They have a small pilot study (n = 12 per interface) and find a partial eta‑squared for the interaction of about 0.10. Converting that to f:

[
f = \sqrt{\frac{\eta^2}{1-\eta^2}} \approx \sqrt{\frac{0.10}{0.90}} \approx 0.33
]

Given that pilot estimates are noisy, they downscale and plan for f = 0.25.

Power analysis inputs:

  • Effect size f: 0.25
  • α: 0.05
  • Power: 0.80
  • Groups: 2
  • Measurements: 3
  • Correlation among repeated measures: 0.5 (tasks are related but not identical)
  • ε: 1

This yields a required sample of roughly 30–35 users per interface (60–70 total). For a UX lab with a moderate recruitment budget, that’s realistic.

This UX case is one of those real examples of power analysis for mixed-design ANOVA examples that shows up outside the usual academic psychology or medical worlds.


Sports science: Training program by time with repeated performance tests

Sports scientists routinely use mixed designs to compare training programs over multiple testing sessions.

Imagine a strength‑training study comparing high‑volume vs. low‑volume programs on squat 1‑RM measured at baseline, 6 weeks, and 12 weeks.

  • Between‑subjects factor: Program (high vs. low volume)
  • Within‑subjects factor: Time (3 testing sessions)
  • Outcome: 1‑RM load (kg)

A recent meta‑analysis suggests that differences in strength gains between these types of programs are often small to medium. The researcher decides to plan for f = 0.20 on the Group × Time interaction.

Inputs:

  • Effect size f: 0.20
  • α: 0.05
  • Power: 0.85 (they want a bit more confidence)
  • Groups: 2
  • Measurements: 3
  • Correlation among repeated measures: 0.8 (1‑RM is quite stable within person)
  • ε: 0.95

The power analysis suggests they need about 40–45 athletes per group (80–90 total). Given that athlete recruitment is hard, they might either:

  • Accept slightly lower power, or
  • Extend the data collection across multiple teams or seasons.

This sports science case is another example of power analysis for mixed-design ANOVA where practical constraints push back against the ideal sample size.


Public health survey: Policy exposure by time with repeated outcomes

Mixed designs are not limited to lab experiments. Consider a public health survey tracking attitudes toward a new vaccination policy over three waves of data collection.

  • Between‑subjects factor: Policy exposure (high vs. low exposure regions)
  • Within‑subjects factor: Time (3 survey waves across a year)
  • Outcome: Vaccination intention score (0–10)

Researchers want to detect whether intention increases more over time in high‑exposure regions.

They consult methodological notes from the CDC and prior attitude‑change surveys, which suggest moderate within‑person correlations and small‑to‑moderate policy effects.

Power analysis setup:

  • Effect size f: 0.18 (a bit smaller than typical medium)
  • α: 0.05
  • Power: 0.80
  • Groups: 2
  • Measurements: 3
  • Correlation among repeated measures: 0.6
  • ε: 0.9

Because surveys are relatively cheap to administer online, they can oversample. The power analysis might say about 90–100 respondents per exposure group are enough, but the team intentionally recruits 300+ per group to allow subgroup analyses (age, gender, etc.) and to hedge against attrition.

This is a nice example of power analysis for mixed-design ANOVA where the calculated minimum is treated as a floor, not a target.


By 2024–2025, several trends in power analysis for mixed-design ANOVA have become hard to ignore:

  • Shift toward mixed‑effects models: Many fields now favor linear mixed‑effects models (LMMs) over classical mixed‑design ANOVA, especially with missing data or unbalanced designs. Still, the ANOVA‑style power analysis is often used as a planning approximation.
  • Greater transparency: Journals increasingly expect pre‑registered power analyses, especially in psychology and clinical trials. For instance, the NIH explicitly asks for justified sample size calculations in grant applications.
  • Use of simulation: When designs are complex (e.g., nested data, unequal time intervals), researchers simulate data under assumed parameters instead of relying on closed‑form ANOVA power formulas.

Even with these changes, many authors continue to report results in ANOVA terms (F‑tests for main effects and interactions), so having strong examples of power analysis for mixed-design ANOVA examples is still very relevant.


Practical tips drawn from the examples

Looking across these real examples of power analysis for mixed-design ANOVA examples, a few patterns stand out.

Focus on the interaction you actually care about

In almost every example above, the key target for power is the Group × Time (or equivalent) interaction, not just the main effects. If your scientific question is about differential change over time, that interaction is what you should power the study to detect.

Use realistic correlations among repeated measures

Power for within–between interactions is highly sensitive to the assumed correlation among repeated measures:

  • Higher correlations usually increase power (less within‑subject noise)
  • Lower correlations reduce power

If you have pilot data, use it. If not, borrow estimates from similar studies (e.g., from NIH‑funded trials or published psychometric work). Over‑optimistic assumptions here are a common reason that studies end up underpowered.

Convert real‑world differences into effect sizes

The best examples of power analysis for mixed-design ANOVA do not start with a magical “medium effect size.” They start from a meaningful difference:

  • 5 mm Hg in blood pressure
  • A 5‑point increase in quiz scores
  • A 10‑second reduction in task time

Then they translate this into a standardized effect size (f, d, or partial η²) using prior standard deviations.

Account for attrition and missing data

Mixed‑design ANOVA traditionally expects complete data, but in the real world people drop out, miss sessions, or skip survey waves. Every example of power analysis for mixed-design ANOVA should end with: “Now inflate for attrition.” Clinical and longitudinal studies often add 15–30% to the required sample size.

When in doubt, consider simulation

For more complicated setups—unequal group sizes, varying time intervals, or non‑normal outcomes—simulation in R, Python, or specialized software can provide more realistic power estimates than a simple ANOVA calculator. But the logic is the same as in these examples: specify effect sizes, correlations, and variance components, then estimate power under those assumptions.


FAQ: examples and practical questions

Q1. Can you give a simple example of power analysis for mixed-design ANOVA with two groups and two time points?
Imagine a pretest–posttest study with treatment vs. control. You expect a standardized difference in change scores of d = 0.50 (medium). Converting to an ANOVA interaction effect size might give f ≈ 0.25. With α = 0.05, power = 0.80, correlation between pre and post of 0.6, and two groups, many power tools will point you toward about 34 participants per group (~68 total). This is a classic textbook example of power analysis for mixed-design ANOVA.

Q2. What are typical inputs for real examples of power analysis for mixed-design ANOVA examples?
Most real examples include: the number of groups, number of repeated measures, the effect size for the interaction (often f between 0.15 and 0.25), alpha (usually 0.05), desired power (0.80–0.90), estimated correlation among repeated measures (0.4–0.8), and a nonsphericity correction ε (0.7–1.0). They then adjust the resulting sample size upward for expected dropout.

Q3. Do I always need a mixed-design ANOVA power analysis if I plan to use mixed‑effects models?
Not always, but it’s a reasonable approximation. Many researchers use ANOVA‑style power analysis as a starting point, then refine with simulation for mixed‑effects models. Agencies like the NIH are generally fine with this approach as long as your assumptions and reasoning are clearly documented.

Q4. Where can I find more example of power analysis for mixed-design ANOVA in the literature?
Look at methods sections in psychology and clinical trial papers that use repeated measures designs. Journals associated with the American Psychological Association often include detailed power analyses. Public repositories like the Open Science Framework also host pre‑registrations with worked examples.

Q5. Is it better to over‑estimate or under‑estimate the effect size?
If you over‑estimate the effect size, you risk underpowered studies. Many methodologists now recommend planning for slightly smaller effects than you hope to see, especially in fields where publication bias has inflated past estimates.


The bottom line: the best way to learn power analysis for mixed-design ANOVA is to study real examples like these, then adapt them to your own design, outcomes, and constraints. Once you can explain your own example of power analysis for mixed-design ANOVA clearly on paper, you’re in good shape to justify your sample size to reviewers, funders, and your future self.

Explore More Statistical Power Analysis Examples

Discover more examples and insights in this category.

View All Statistical Power Analysis Examples