Creating and Manipulating Data Frames in R

Explore practical examples of creating and manipulating data frames in R, perfect for beginners and data enthusiasts.
By Jamie

Introduction to Data Frames in R

Data frames are a fundamental data structure in R, similar to tables in a database or spreadsheets in Excel. They allow you to store and manipulate data in a structured format. In this article, we will explore three practical examples of creating and manipulating data frames in R, providing clear use cases and code snippets.

Example 1: Creating a Simple Data Frame

Context

Creating a data frame is often the first step in data analysis. This example demonstrates how to create a simple data frame from vectors, which is useful for organizing small datasets.

## Define vectors for each column
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 35)
city <- c("New York", "Los Angeles", "Chicago")

## Create a data frame using the vectors
people_df <- data.frame(Name = name, Age = age, City = city)

## Display the data frame
print(people_df)

This code creates a data frame named people_df with three columns: Name, Age, and City. Each row represents a different person, providing a structured view of the data.

Notes

  • Data frames can include different data types in each column (e.g., numeric, character).
  • You can customize the column names by providing a named vector to the data.frame() function.

Example 2: Manipulating Data Frames with dplyr

Context

The dplyr package in R is a powerful tool for data manipulation. This example demonstrates how to filter and arrange data in a data frame, which is essential for data analysis tasks.

## Load the dplyr package
library(dplyr)

## Create a sample data frame
sales_df <- data.frame(Region = c("North", "South", "East", "West"),
                        Sales = c(15000, 20000, 18000, 22000))

## Filter regions with sales greater than 18000 and arrange in descending order
filtered_sales <- sales_df %>% 
  filter(Sales > 18000) %>% 
  arrange(desc(Sales))

## Display the filtered and arranged data frame
print(filtered_sales)

In this example, we created a data frame named sales_df to store sales data by region. Using dplyr, we filtered the data to include only regions with sales over 18,000 and arranged the results in descending order.

Notes

  • The %>% operator, known as the pipe operator, allows you to chain operations in a clear and concise manner.
  • You can use other dplyr functions like mutate() to create new columns or summarize() to aggregate data.

Example 3: Merging Data Frames

Context

Merging data frames is a common task when combining datasets from different sources. This example illustrates how to merge two data frames based on a common key, which is crucial for comprehensive data analysis.

## Create two sample data frames
students_df <- data.frame(StudentID = c(1, 2, 3),
                           Name = c("Alice", "Bob", "Charlie"))

scores_df <- data.frame(StudentID = c(1, 2, 3),
                         Score = c(85, 90, 78))

## Merge the data frames by StudentID
merged_df <- merge(students_df, scores_df, by = "StudentID")

## Display the merged data frame
print(merged_df)

In this example, we created two data frames: students_df with student IDs and names, and scores_df with student scores. We then merged them using merge(), resulting in a combined data frame with student names and scores.

Notes

  • The merge() function can take additional parameters like all.x or all.y to control the merging behavior (e.g., left join, right join).
  • Ensure that the key used for merging is unique in at least one of the data frames to avoid duplication of records.