Drop rows which are duplicates regarding certain columns [duplicate]

Question

I want to identify and remove observations which are duplicates in certain aspects.

In my example, I want to get rid of rows 1 and 6, as they are the same in both V1 and V2. That they differ in V3 shouldn't matter.

df <- data.frame(V1 = c("a","b","c","a","c","a"),
                 V2 = c(1,2,1,2,3,1),
                 V3 = c(1,2,3,4,5,6))

Applying dplyr::distinct(df, V1, V2) results in row 6 being discarded while row 1 remains. As I said, I want both rows 1 and 6 removed. I am sure the problem is trivial, but I can't think of the correct search terms ...

Thanks!

df[!(duplicated(df[c(1,2)]) | duplicated(df[c(1,2)], fromLast = TRUE)), ] — M--
– M--, Commented Feb 25, 2023 at 7:37

r2evans · Accepted Answer · 2023-02-24 23:32:48Z

2

We can group-by then filter:

group_by(df, V1, V2) %>%
  filter(n() == 1) %>%
  ungroup()
# # A tibble: 4 × 3
#   V1       V2    V3
#   <chr> <dbl> <dbl>
# 1 b         2     2
# 2 c         1     3
# 3 a         2     4
# 4 c         3     5

answered Feb 24, 2023 at 23:32

r2evans

167k8 gold badges92 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jon Spring Over a year ago

Or with dplyr 1.1.0, filter(df, n() == 1, .by = c(V1, V2))

SAL · Accepted Answer · 2023-02-25 01:11:16Z

1

Using data.table

library(data.table)

setDT(df)

df[, .SD[.N == 1], by = .(V1, V2)]

  V1 V2 V3
1:  b  2  2
2:  c  1  3
3:  a  2  4
4:  c  3  5

answered Feb 25, 2023 at 1:11

SAL

2,2882 gold badges7 silver badges17 bronze badges

Collectives™ on Stack Overflow

Drop rows which are duplicates regarding certain columns [duplicate]

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related