How to Filter Data in R
Learn how to filter data frames in R using base R and dplyr's filter function.
Filtering data is essential for data analysis in R. Here are the main methods.
Method 1: Base R with Brackets
Use logical indexing:
df <- data.frame(
name = c("Alice", "Bob", "Charlie", "Diana"),
age = c(25, 30, 35, 28),
city = c("Paris", "London", "Paris", "Berlin")
)
# Filter by single condition
adults <- df[df$age >= 30, ]
print(adults)
# Filter by multiple conditions
paris_young <- df[df$city == "Paris" & df$age < 30, ]
print(paris_young)
Method 2: subset() Function
A more readable base R approach:
# Single condition
adults <- subset(df, age >= 30)
# Multiple conditions
result <- subset(df, city == "Paris" & age < 30)
# Select specific columns too
result <- subset(df, age > 25, select = c(name, age))
Method 3: dplyr filter() (Recommended)
The tidyverse approach:
library(dplyr)
df <- data.frame(
name = c("Alice", "Bob", "Charlie", "Diana"),
age = c(25, 30, 35, 28),
city = c("Paris", "London", "Paris", "Berlin")
)
# Single condition
adults <- df %>% filter(age >= 30)
# Multiple conditions (AND)
result <- df %>% filter(city == "Paris", age < 30)
# Multiple conditions (OR)
result <- df %>% filter(city == "Paris" | city == "Berlin")
Method 4: Filter with %in% for Multiple Values
library(dplyr)
# Filter where city is in a list
cities <- c("Paris", "Berlin")
result <- df %>% filter(city %in% cities)
print(result)
Method 5: Filter with String Matching
library(dplyr)
library(stringr)
# Filter names starting with "A"
result <- df %>% filter(str_starts(name, "A"))
# Filter names containing "li"
result <- df %>% filter(str_detect(name, "li"))
Summary
- Use base R brackets for quick filtering
- Use subset() for readable base R code
- Use dplyr filter() for tidyverse workflows (recommended)
- Use %in% for checking multiple values