57

I'm trying to identify the values in a data frame that do not match, but can't figure out how to do this.

# make data frame 
a <- data.frame( x =  c(1,2,3,4)) 
b <- data.frame( y =  c(1,2,3,4,5,6))

# select only values from b that are not in 'a'
# attempt 1: 
results1 <- b$y[ !a$x ]

# attempt 2:  
results2 <- b[b$y != a$x,]

If a = c(1,2,3) this works, as a is a multiple of b. However, I'm trying to just select all the values from data frame y, that are not in x, and don't understand what function to use.

3 Answers 3

84

If I understand correctly, you need the negation of the %in% operator. Something like this should work:

subset(b, !(y %in% a$x))

> subset(b, !(y %in% a$x))
  y
5 5
6 6
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @Chase, I spend some time trying to figure out what the negation of the %in% operator was, but could not figure it out. This answer is also helpful as it neatly subsets the data.
@celenius - the %in% operator returns a logical vector that tells us whether or not there is a match between the first operator and the second. (b$y %in% a$x) [1] TRUE TRUE TRUE TRUE FALSE FALSE. The ! implies negation so will return !(b$y %in% a$x)[1] FALSE FALSE FALSE FALSE TRUE TRUE . Does that help explain things?
Late to the party, but Hmisc now has a function %nin% which is not in. Very useful
28

Try the set difference function setdiff. So you would have

results1 = setdiff(a$x, b$y)   # elements in a$x NOT in b$y
results2 = setdiff(b$y, a$x)   # elements in b$y NOT in a$x

Comments

26

You could also use dplyr for this task. To find what is in b but not a:

library(dplyr)    
anti_join(b, a, by = c("y" = "x"))

#   y
# 1 5
# 2 6

1 Comment

That is exactly what I was looking for, never saw it before, thanks Joe (works with dbplyr i.e. in a SQL remote context)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.