Practical examples of statistical power analysis for t-tests
Real-world examples of statistical power analysis for t-tests
Let’s skip abstract theory and start with scenarios researchers actually face. These examples of statistical power analysis for t-tests all follow the same basic ingredients:
- A research question framed as a t-test (independent, paired, or one-sample)
- A target effect size (what difference matters in practice?)
- A significance level (α, usually 0.05)
- A target power (often 0.80 or 0.90)
- A resulting sample size per group (or total)
Throughout, I’ll note how you’d set these up in common tools like G*Power or R, and where current 2024–2025 practice is shifting (spoiler: more focus on realistic effect sizes and higher power for high-stakes work).
Example 1: Two-sample t-test in a clinical trial
Scenario: A team at a medical center is testing whether a new pain medication reduces post-surgery pain scores compared with standard care. Pain is measured on a 0–10 numeric scale 24 hours after surgery.
- Planned test: Two-sample (independent) t-test
- Outcome: Mean pain score
- Groups: New drug vs control
- Hypothesis: New drug reduces mean pain score
Choosing the effect size
They decide that a 1-point reduction on the 0–10 scale is clinically meaningful. Past data show a standard deviation of about 2.5 points.
Standardized effect size (Cohen’s d):
[
d = \frac{\Delta_{mean}}{SD} = \frac{1.0}{2.5} = 0.40
]
So they plan for d = 0.40, a small-to-moderate effect.
Power analysis setup
- Test: Two-sided independent t-test
- α = 0.05
- Target power = 0.90 (they want high confidence because this affects treatment guidelines)
- Effect size d = 0.40
Using GPower (Test family: *t tests → Means: Difference between two independent means), or R’s pwr.t.test() from the pwr package, the power analysis yields:
- Required sample size ≈ 134 participants per group (268 total)
Interpretation: If the true effect is a 1-point reduction (d = 0.40), a trial with 134 people per group has about 90% power to detect it at α = 0.05.
This is one of the best examples of statistical power analysis for t-tests in a health setting, because it ties the effect size directly to a clinically meaningful change. For background on pain scales and clinical importance, see NIH guidance: https://www.nih.gov/
Example 2: Paired t-test in a weight-loss program
Scenario: A wellness clinic wants to evaluate a 12-week weight-loss program. They measure each participant’s weight before and after the program.
- Planned test: Paired t-test (same people measured twice)
- Outcome: Weight in pounds
- Hypothesis: Mean weight decreases after 12 weeks
Why paired t-tests change the power
Paired t-tests generally need fewer participants than independent t-tests because each person serves as their own control, reducing variability. The key parameter is the standard deviation of the change scores, not the raw weight.
Suppose prior programs show:
- Average expected weight loss: 8 pounds
- SD of change scores: 12 pounds
Effect size (Cohen’s d for paired designs) is:
[
d = \frac{8}{12} \approx 0.67
]
Power analysis setup
- Test: Paired t-test
- α = 0.05
- Target power = 0.80
- Effect size d ≈ 0.67
Using GPower (Test family: *t tests → Means: Difference between two dependent means), or pwr.t.test(type = "paired") in R, the analysis gives:
- Required sample size ≈ 24 participants
So with just 24 participants measured pre- and post-program, the clinic has about 80% power to detect an average 8-pound loss, given the assumed variability. This is a clean example of statistical power analysis for t-tests showing how pairing can dramatically lower the required sample size.
For more on weight-loss outcomes and typical variability, you can look at resources from the CDC: https://www.cdc.gov/obesity/index.html
Example 3: One-sample t-test for quality control
Scenario: A manufacturer produces metal rods that should average 100 cm in length. A new machine is installed, and the engineer wants to test whether the mean length differs from 100 cm.
- Planned test: One-sample t-test
- Outcome: Rod length (cm)
- Hypothesis: Mean length ≠ 100 cm
Setting a practically important difference
The engineer decides that any shift of 0.5 cm or more is unacceptable. Historical data show that rod lengths have an SD of 1.5 cm.
Effect size:
[
d = \frac{0.5}{1.5} = 0.33
]
Power analysis setup
- Test: One-sample t-test
- α = 0.01 (tighter threshold, because quality failures are expensive)
- Target power = 0.90
- Effect size d = 0.33
Using GPower (Test family: *t tests → Means: Difference from a constant), or pwr.t.test(type = "one.sample") in R, the power analysis yields:
- Required sample size ≈ 190 rods
Because α is stricter (0.01 instead of 0.05), the required sample size is larger. This example of statistical power analysis for t-tests highlights the trade-off between significance level and sample size: stricter alpha, more data.
Example 4: A/B testing with an independent t-test
Scenario: A product team at a software company is testing two onboarding flows. They measure time to complete onboarding in minutes.
- Planned test: Two-sample t-test
- Outcome: Time to completion
- Groups: Current onboarding (A) vs new flow (B)
They want to detect a 1.2-minute reduction in average onboarding time. Past data suggest:
- Mean time (current): 8 minutes
- SD: 3 minutes
Effect size:
[
d = \frac{1.2}{3} = 0.40
]
Power analysis setup
- Test: Two-sided independent t-test
- α = 0.05
- Target power = 0.80
- Effect size d = 0.40
A power analysis (again via G*Power or R) gives:
- Required sample size ≈ 100 users per variant (200 total)
If they expect high traffic, they might instead fix the sample size based on a time window and compute achieved power afterward. But planning in advance using examples of statistical power analysis for t-tests like this one keeps the experiment from being underpowered or from running far longer than needed.
For a broader discussion of experimental design in digital products, see resources from Harvard’s online statistics and data science programs: https://online-learning.harvard.edu/
Example 5: Educational study with clustered data (and a warning)
Scenario: An education researcher tests a new reading program in 4th-grade classrooms. Some classrooms use the new program; others use the standard curriculum. The outcome is reading test score at the end of the year.
- Planned test (naïve): Two-sample t-test on student-level scores
- Outcome: Test score
They want to detect a 5-point difference on a test with SD ≈ 15 points.
Effect size:
[
d = \frac{5}{15} = 0.33
]
Naïve power analysis
If they ignore classroom clustering and treat each student as independent, a power analysis for a two-sample t-test with:
- α = 0.05
- Power = 0.80
- d = 0.33
might say they need about 145 students per group.
The clustering problem
But students are clustered within classrooms, and scores within a classroom are correlated. This reduces the effective sample size. A simple t-test power analysis will be overly optimistic.
Current 2024–2025 practice in education and social science research is to adjust for clustering using design effects or to use multilevel models. Still, running this kind of example of statistical power analysis for t-tests is a useful first pass to understand the scale of data needed, before upgrading to a more realistic model.
For best practices in education research design, the Institute of Education Sciences (IES) provides detailed guidance: https://ies.ed.gov/
Example 6: Paired t-test for blood pressure in a pilot study
Scenario: A cardiology team is planning a pilot study to test whether a new dietary intervention reduces systolic blood pressure (SBP) over 8 weeks.
- Planned test: Paired t-test
- Outcome: SBP (mmHg)
From prior literature and internal data (for instance, from NIH and Mayo Clinic resources), they expect:
- Mean reduction: 6 mmHg
- SD of change scores: 10 mmHg
Effect size:
[
d = \frac{6}{10} = 0.60
]
Power analysis setup
- Test: Paired t-test
- α = 0.05
- Target power = 0.80
- Effect size d = 0.60
Power analysis suggests:
- Required sample size ≈ 23 participants
Because this is a pilot, they might inflate that to 30–35 to account for dropouts. This is one of those real examples of statistical power analysis for t-tests where the aim is not to “prove” effectiveness yet, but to size a pilot that can give stable estimates for a later, larger trial.
For high-quality background on blood pressure interventions, see NIH and Mayo Clinic:
- https://www.nhlbi.nih.gov/
- https://www.mayoclinic.org/
Example 7: One-sample t-test in user satisfaction research
Scenario: A UX team wants to see if a redesigned interface achieves an average System Usability Scale (SUS) score above 80. SUS scores range from 0 to 100.
- Planned test: One-sample t-test
- Outcome: SUS score
- Null hypothesis: Mean SUS = 80
- Alternative hypothesis: Mean SUS > 80
From prior studies, SUS scores in similar products have SD ≈ 12.
They decide they care about detecting a 5-point improvement (mean = 85) over the target of 80.
Effect size:
[
d = \frac{85 - 80}{12} = \frac{5}{12} \approx 0.42
]
Power analysis setup
- Test: One-sample t-test, one-sided
- α = 0.05
- Target power = 0.80
- Effect size d ≈ 0.42
Power analysis yields:
- Required sample size ≈ 40 users
This example of statistical power analysis for t-tests shows how one-sided tests (when justified) can reduce the required sample size. But the trade-off is you’re committing in advance to only testing improvement in one direction.
How to choose effect sizes in t-test power analysis
All of these examples of statistical power analysis for t-tests hide a hard choice: what effect size should you plan for?
Common approaches in 2024–2025 research practice:
- Use prior studies or meta-analyses. For medical and psychological outcomes, meta-analyses from NIH-funded work or large consortia give realistic ranges of effect sizes.
- Anchor to practical significance. Ask: what difference would change a decision? A 1-point pain reduction, a 5-point test score gain, a 6 mmHg drop in blood pressure—these are anchored in real-world impact.
- Run a pilot. Use a small pilot to estimate the SD and plausible effect sizes, then run a more serious power analysis for the full study.
- Sensitivity analysis. Instead of picking a single effect size, compute required sample sizes for a range (e.g., d = 0.3, 0.4, 0.5) and see how conclusions change.
In other words, examples include small effects that still matter (like blood pressure or test scores) and larger effects where small samples suffice (like dramatic weight-loss interventions).
Quick workflow for running your own t-test power analysis
Here’s how you can turn these examples of statistical power analysis for t-tests into your own workflow:
- Specify the design. One-sample, independent, or paired t-test? Two-sided or one-sided?
- Define a meaningful difference. In the original units (points, minutes, pounds, mmHg).
- Estimate variability. Use historical data, literature, or a pilot to get an SD (or SD of change for paired designs).
- Convert to effect size d. Difference ÷ SD.
- Choose α and target power. Common combinations: α = 0.05 with power = 0.80 or 0.90; for high-stakes work, consider α = 0.01.
- Run the analysis in software. G*Power, R (
pwrpackage), Python (statsmodels.stats.power), or even online calculators. - Document assumptions. In your protocol or preregistration, record the effect size, SD source, α, power, and software used.
This simple pattern underlies all the real examples of statistical power analysis for t-tests you’ve just seen.
FAQ: examples of statistical power analysis for t-tests
Q1. Can you give another quick example of statistical power analysis for t-tests in psychology?
Imagine a psychologist testing whether a mindfulness app reduces anxiety scores on a standardized scale. Prior data suggest an SD of 9 points, and they want to detect a 4-point drop (d ≈ 0.44). Using a two-sample t-test with α = 0.05 and power = 0.80, a power analysis might suggest around 80–90 participants per group. This fits neatly alongside the other examples of statistical power analysis for t-tests in clinical and behavioral science.
Q2. Is 80% power always enough?
Not necessarily. For high-stakes decisions (new drugs, major policy changes), many researchers now aim for 90% power or higher. For exploratory or pilot work, 80% is common, but these studies are often framed as hypothesis-generating rather than definitive.
Q3. Do I have to use Cohen’s d?
Cohen’s d is the standard effect size for t-tests because it’s unitless and easy to compare across studies. But the real thinking should happen in the original units (minutes, points, pounds). The examples of statistical power analysis for t-tests above always start with a meaningful difference in real units, then convert to d as a convenience.
Q4. How does unequal group size affect power analysis for t-tests?
If one group is much smaller than the other (say, a 1:3 ratio), you lose power compared with equal allocation. Most software lets you specify allocation ratios. In practice, keeping groups close to equal is usually more efficient, unless there’s a strong cost or ethical reason not to.
Q5. Where can I learn more about power analysis beyond these examples?
Good starting points include open course materials from universities (for example, statistics courses at Harvard or other major institutions) and method guides from government agencies. While this page focuses on examples of statistical power analysis for t-tests, the same logic extends to ANOVA, regression, and generalized linear models.
The bottom line: if you can clearly state your outcome, your meaningful difference, and your expected variability, you can run a power analysis for a t-test. The real art is picking effect sizes that matter in the real world—exactly what these examples are meant to illustrate.
Related Topics
Practical examples of effect size calculations for power analysis
Practical examples of G*Power power analysis examples for real studies
Practical examples of statistical power analysis for t-tests
Real‑world examples of power analysis for mixed-design ANOVA
Best examples of power analysis for ANOVA examples in real research
Examples of Simulation for Power Analysis: 3 Practical Examples Researchers Actually Use
Explore More Statistical Power Analysis Examples
Discover more examples and insights in this category.
View All Statistical Power Analysis Examples