Data frames are a fundamental data structure in R, similar to tables in a database or spreadsheets in Excel. They allow you to store and manipulate data in a structured format. In this article, we will explore three practical examples of creating and manipulating data frames in R, providing clear use cases and code snippets.
Creating a data frame is often the first step in data analysis. This example demonstrates how to create a simple data frame from vectors, which is useful for organizing small datasets.
## Define vectors for each column
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 35)
city <- c("New York", "Los Angeles", "Chicago")
## Create a data frame using the vectors
people_df <- data.frame(Name = name, Age = age, City = city)
## Display the data frame
print(people_df)
This code creates a data frame named people_df
with three columns: Name, Age, and City. Each row represents a different person, providing a structured view of the data.
data.frame()
function.The dplyr
package in R is a powerful tool for data manipulation. This example demonstrates how to filter and arrange data in a data frame, which is essential for data analysis tasks.
## Load the dplyr package
library(dplyr)
## Create a sample data frame
sales_df <- data.frame(Region = c("North", "South", "East", "West"),
Sales = c(15000, 20000, 18000, 22000))
## Filter regions with sales greater than 18000 and arrange in descending order
filtered_sales <- sales_df %>%
filter(Sales > 18000) %>%
arrange(desc(Sales))
## Display the filtered and arranged data frame
print(filtered_sales)
In this example, we created a data frame named sales_df
to store sales data by region. Using dplyr
, we filtered the data to include only regions with sales over 18,000 and arranged the results in descending order.
%>%
operator, known as the pipe operator, allows you to chain operations in a clear and concise manner.dplyr
functions like mutate()
to create new columns or summarize()
to aggregate data.Merging data frames is a common task when combining datasets from different sources. This example illustrates how to merge two data frames based on a common key, which is crucial for comprehensive data analysis.
## Create two sample data frames
students_df <- data.frame(StudentID = c(1, 2, 3),
Name = c("Alice", "Bob", "Charlie"))
scores_df <- data.frame(StudentID = c(1, 2, 3),
Score = c(85, 90, 78))
## Merge the data frames by StudentID
merged_df <- merge(students_df, scores_df, by = "StudentID")
## Display the merged data frame
print(merged_df)
In this example, we created two data frames: students_df
with student IDs and names, and scores_df
with student scores. We then merged them using merge()
, resulting in a combined data frame with student names and scores.
merge()
function can take additional parameters like all.x
or all.y
to control the merging behavior (e.g., left join, right join).