Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more group means. It helps in determining whether any of those differences are statistically significant. ANOVA is particularly useful when comparing three or more groups. This article presents three diverse, practical examples of ANOVA using R programming to illustrate its application in various scenarios.
In this example, we will investigate how different fertilizers affect the growth of plants. We have data from an experiment where three groups of plants were treated with different fertilizers: A, B, and C. We will use ANOVA to determine if there are significant differences in the average height of the plants after a growing period.
## Load necessary library
library(ggplot2)
## Create the dataset
set.seed(123)
fertilizer <- rep(c('A', 'B', 'C'), each = 10)
height <- c(rnorm(10, mean = 20, sd = 2),
rnorm(10, mean = 22, sd = 2),
rnorm(10, mean = 25, sd = 2))
data <- data.frame(fertilizer, height)
## Perform ANOVA
anova_result <- aov(height ~ fertilizer, data = data)
summary(anova_result)
## Visualize the results
ggplot(data, aes(x = fertilizer, y = height)) +
geom_boxplot() +
labs(title = 'Effect of Fertilizers on Plant Growth',
x = 'Fertilizer Type', y = 'Plant Height')
In the output of the ANOVA summary, you will see an F-statistic and a p-value. A p-value less than 0.05 indicates that there are significant differences in plant height among the fertilizer types. The boxplot visualization helps illustrate these differences.
This example examines how different study techniques affect students’ exam scores. We have data from a group of students who used three different study methods: Method 1, Method 2, and Method 3. ANOVA will help us determine if there’s a significant difference in the average scores achieved by students using these methods.
## Load necessary library
library(ggplot2)
## Create the dataset
set.seed(456)
study_method <- rep(c('Method 1', 'Method 2', 'Method 3'), each = 15)
scores <- c(rnorm(15, mean = 75, sd = 5),
rnorm(15, mean = 80, sd = 5),
rnorm(15, mean = 85, sd = 5))
data <- data.frame(study_method, scores)
## Perform ANOVA
anova_result <- aov(scores ~ study_method, data = data)
summary(anova_result)
## Visualize the results
ggplot(data, aes(x = study_method, y = scores)) +
geom_boxplot() +
labs(title = 'Impact of Study Techniques on Exam Scores',
x = 'Study Technique', y = 'Exam Score')
The ANOVA summary will provide insights into whether the study techniques yield different average exam scores. Again, a p-value less than 0.05 indicates significant differences, and the boxplot will visualize these comparisons.
In this scenario, we analyze customer satisfaction ratings for three different restaurants. Customers rated their satisfaction on a scale of 1 to 10 after dining at one of the three restaurants. This analysis will help us understand if there’s a significant difference in customer satisfaction across the restaurants.
## Load necessary library
library(ggplot2)
## Create the dataset
set.seed(789)
restaurant <- rep(c('Restaurant A', 'Restaurant B', 'Restaurant C'), each = 20)
satisfaction <- c(rnorm(20, mean = 7, sd = 1.5),
rnorm(20, mean = 8, sd = 1.5),
rnorm(20, mean = 6, sd = 1.5))
data <- data.frame(restaurant, satisfaction)
## Perform ANOVA
anova_result <- aov(satisfaction ~ restaurant, data = data)
summary(anova_result)
## Visualize the results
ggplot(data, aes(x = restaurant, y = satisfaction)) +
geom_boxplot() +
labs(title = 'Customer Satisfaction Across Restaurants',
x = 'Restaurant', y = 'Satisfaction Rating')
Review the ANOVA summary to check for significant differences in customer satisfaction ratings among restaurants. A p-value below 0.05 indicates that at least one restaurant’s satisfaction rating is significantly different from the others, and the boxplot will clearly display these differences visually.