Home ยป How to Remove Duplicate Rows in R so None are Left

How to Remove Duplicate Rows in R so None are Left

by Tutor Aspire

You can use the following methods in R to remove duplicate rows from a data frame so that none are left in the resulting data frame:

Method 1: Use Base R

new_df !(duplicated(df) | duplicated(df, fromLast=TRUE)), ]

Method 2: Use dplyr

library(dplyr)

new_df %
          group_by(across(everything())) %>%
          filter(n()==1)

The following examples show how to use each method in practice with the following data frame:

#create data frame
df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(20, 20, 28, 14, 13, 18, 27, 13))

#view data frame
df

  team points
1    A     20
2    A     20
3    A     28
4    A     14
5    B     13
6    B     18
7    B     27
8    B     13

Example 1: Use Base R

The following code shows how to use functions from base R to remove duplicate rows from the data frame so that none are left:

#create new data frame that removes duplicates so none are left
new_df !(duplicated(df) | duplicated(df, fromLast=TRUE)), ]

#view new data frame
new_df

  team points
3    A     28
4    A     14
6    B     18
7    B     27

Notice that each of the duplicate rows have been removed from the data frame and none of the duplicates remain.

Example 2: Use dplyr

The following code shows how to use functions from the dplyr package in R to remove duplicate rows from the data frame so that none are left:

library(dplyr)

#create new data frame that removes duplicates so none are left
new_df %
          group_by(across(everything())) %>%
          filter(n()==1)

#view new data frame
new_df

# A tibble: 4 x 2
# Groups:   team, points [4]
  team  points
    
1 A         28
2 A         14
3 B         18
4 B         27

Notice that each of the duplicate rows have been removed from the data frame and none of the duplicates remain.

Also notice that this produces the same result as the previous method.

Note: For extremely large data frames, the dplyr method will be faster than the base R method.

Additional Resources

The following tutorials explain how to perform other common functions in R:

How to Remove Rows in R Based on Condition
How to Remove Rows with NA in One Specific Column in R

You may also like