How to remove rows from a data frame using a subset?

Question

I have a column in a data frame called Retest_data that goes like this:

SFC
YU006UGD31092
YU006UGD31071
YU006UGD30152
YU006UGD25831
YU006UGD25831
YU006UGD25332
YU006UG922912
YU006UG922912

And what I want is to remove all instances of values that occur more than once. So dplyr functions like unique and distinct won't work for me.

I also have a list called Remove_SFC that has all the SFC values that occur more than once. How can I use this list to remove all recurring values from my data? Thanks.

So dplyr functions like unique and distinct won't work for me. Why can't you use distinct() or unique()? — Martin Gal
– Martin Gal, Commented Jul 27, 2021 at 8:10
Which format is this list of values? Is it a real R list or a data frame or an external csv? Please share an example. — deschen
– deschen, Commented Jul 27, 2021 at 8:10
@Martin Gal I assume because distinct() would keep the first occurence, but the TO might want to delete ALL occurences in case there are duplicates. — deschen
– deschen, Commented Jul 27, 2021 at 8:11
You can use dplyr::anti_join(Retest_data, Remove_SFC, by="SFC"). — Martin Gal
– Martin Gal, Commented Jul 27, 2021 at 8:12
@deschen Its a data frame similar to the example above but all the values are values that occur more than once in the original data frame — CMAHER
– CMAHER, Commented Jul 27, 2021 at 8:16

MonJeanJean · Accepted Answer · 2021-07-27 08:11:19Z

3

Data:

df <- data.frame(SFC = c("YU006UGD31092","YU006UGD31071",
                         "YU006UGD30152",
                         "YU006UGD25831",
                         "YU006UGD25831",
                         "YU006UGD25332" ,
                         "YU006UG922912",
                         "YU006UG922912"))

Code:

df %>% 
  group_by(SFC) %>% 
  filter(n() == 1)

Output:

  SFC          
  <chr>        
1 YU006UGD31092
2 YU006UGD31071
3 YU006UGD30152
4 YU006UGD25332

Edit:

If you have the list, you can also do:

df %>% 
  filter(!(SFC %in% Remove_SFC))

answered Jul 27, 2021 at 8:11

MonJeanJean

2,9161 gold badge7 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Martin Gal · Accepted Answer · 2021-07-27 08:42:45Z

1

As an alternative you can use dplyr's anti_join. anti_join return all rows from df without a match in Remove_SFC:

library(dplyr)

df %>% 
  anti_join(data.frame(SFC=Remove_SFC))

which returns

Joining, by = "SFC"
            SFC
1 YU006UGD31092
2 YU006UGD31071
3 YU006UGD30152
4 YU006UGD25332

Data

Remove_SFC <- c("YU006UG922912", "YU006UGD25831")

answered Jul 27, 2021 at 8:42

Martin Gal

17k5 gold badges24 silver badges42 bronze badges

Collectives™ on Stack Overflow

How to remove rows from a data frame using a subset?

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related