2

Please consider this example code:

d1 <- c(1,2,2,3,4,3)
d2 <- c(10,11,12,13,14,15)

dt <- data.frame(d1,d2)

sample.index <- c(2,3)

dt[dt$d1 %in% sample.index, ]

This returns

  d1 d2
2  2 11
3  2 12
4  3 13
6  3 15

which is OK. However, if we have

sample.index <- c(2,2,3)

then the code still returns the same result. Instead I want the rows matching 2 to be returned twice because 2 appears twice in sample.index - how can I achieve this ?

2 Answers 2

5

Maybe this:

sample.index <- c(2,2,3)
merge(dt,data.frame(d1 = sample.index))
  d1 d2
1  2 11
2  2 11
3  2 12
4  2 12
5  3 13
6  3 15
Sign up to request clarification or add additional context in comments.

Comments

1

This is begging for some data.table syntax sugar (goes without mention that it will also be faster):

library(data.table)

d1 <- c(1,2,2,3,4,3)
d2 <- c(10,11,12,13,14,15)

# Note, I set the key to d1
dt <- data.table(d1, d2, key = 'd1')

dt[J(c(2,3))]
#   d1 d2
#1:  2 11
#2:  2 12
#3:  3 13
#4:  3 15

dt[J(c(2,2,3))]
#   d1 d2
#1:  2 11
#2:  2 12
#3:  2 11
#4:  2 12
#5:  3 13
#6:  3 15

Also note that data.table join and merge.data.frame result in somewhat different final ordering.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.