388

A categorical variable V1 in a data frame D1 can have values represented by the letters from A to Z. I want to create a subset D2, which excludes some values, say, B, N and T. Basically, I want a command which is the opposite of %in%

D2 = subset(D1, V1 %in% c("B", "N", "T"))
2

13 Answers 13

503

You can use the ! operator to basically make any TRUE FALSE and every FALSE TRUE. so:

D2 = subset(D1, !(V1 %in% c('B','N','T')))

EDIT: You can also make an operator yourself:

'%!in%' <- function(x,y)!('%in%'(x,y))

c(1,3,11)%!in%1:10
[1] FALSE FALSE  TRUE
Sign up to request clarification or add additional context in comments.

7 Comments

The use of second option is illustrated in the help(match) page (where you would get to if you typed ?"%in%" ) where the new operator is called %w/o%.
also, see ?Negate e.g. "%ni%" <- Negate("%in%")
Negate worked for me when used after defining the new operator, as suggested by baptiste, e.g. subset(df, variable %ni% c("A", "B")) , but not when used directly, e.g. subset(df, variable Negate("%in%") c("A", "B"))
@PatrickT that’s because only operators can be used as operators. and operators are either built-in or start and end with %. To create an operator, you need to assign a function with two operands to a name starting and ending with %.
We can also use filter(!(V1%in% c('B','N','T'))).
|
117

How about:

`%ni%` <- Negate(`%in%`)
c(1,3,11) %ni% 1:10
# [1] FALSE FALSE  TRUE

4 Comments

this one actually doesn't work as it throws an error something about SPECIAL %ni
Still works just fine. R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16
its becuase ' is not `, and you should use the `
The changes have been made. Thanks.
66

Here is a version using filter in dplyr that applies the same technique as the accepted answer by negating the logical with !:

D2 <- D1 %>% dplyr::filter(!V1 %in% c('B','N','T'))

Comments

37

If you look at the code of %in%

 function (x, table) match(x, table, nomatch = 0L) > 0L

then you should be able to write your version of opposite. I use

`%not in%` <- function (x, table) is.na(match(x, table, nomatch=NA_integer_))

Another way is:

function (x, table) match(x, table, nomatch = 0L) == 0L

Comments

17

Using negate from purrr also does the trick quickly and neatly:

`%not_in%` <- purrr::negate(`%in%`)

Then usage is, for example,

c("cat", "dog") %not_in% c("dog", "mouse")

1 Comment

There’s also a built-in Negate that does the same. The only difference is that purrr calls as_mapper on the thing you pass, while Negate calls match.fun. rdocumentation.org/packages/purrr/versions/0.2.5/topics/… stat.ethz.ch/R-manual/R-devel/library/base/html/match.fun.html
9

purrr::compose() is another quick way to define this for later use, as in:

`%!in%` <- compose(`!`, `%in%`)

Comments

7

Another solution could be using setdiff

D1 = c("A",..., "Z") ; D0 = c("B","N","T")

D2 = setdiff(D1, D0)

D2 is your desired subset.

1 Comment

Sometimes it can be useful but it doesn't produce the same results if the are repetitions.
4
library(roperators)

1 %ni% 2:10

If you frequently need to use custom infix operators, it is easier to just have them in a package rather than declaring the same exact functions over and over in each script or project.

1 Comment

While this may be a correct answer, it would be more useful with additional explanation of why it works. Consider editing it to include further details, and if you feel it's better than the accepted answer which was posted nearly a decade ago.
3

Hmisc has %nin% function, which should do this.

https://www.rdocumentation.org/packages/Hmisc/versions/4.4-0/topics/%25nin%25

Comments

2

The package has it built in: %!in%.

Comments

0

The help for %in%, help("%in%"), includes, in the Examples section, this definition of not in,

"%w/o%" <- function(x, y) x[!x %in% y] #-- x without y

Lets try it:

c(2,3,4) %w/o% c(2,8,9)
[1] 3 4

Alternatively

"%w/o%" <- function(x, y) !x %in% y #--  x without y
c(2,3,4) %w/o% c(2,8,9)
# [1] FALSE  TRUE  TRUE

Comments

0
require(TSDT)

c(1,3,11) %nin% 1:10
# [1] FALSE FALSE  TRUE

For more information, you can refer to: https://cran.r-project.org/web/packages/TSDT/TSDT.pdf

Comments

-2

In Frank Harrell's package of R utility functions, he has a %nin% (not in) which does exactly what the original question asked. No need for wheel reinvention.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.