R get correct index using which() condition

Question

I have to find an observation satisfying some criteria and then merge this indices with an other dataset. So I don't need the index of the observations satisfying the condition, but the index that refers to all the observations.

For instance, I want to find the max(x1) given that x2>20 and then use this index in another dataset later. I need the right index, in other words:

dat <- data.frame(name= c("A","B","C","D"),
           x1= c(1,2,3,4),
           x2= c(10,20,30,40))
dat$name[which.max(dat$x1[dat$x2>20])]
[1] B

I want to get

[1] D

i.e. an index of 4, not 2.

dww · Accepted Answer · 2016-05-11 15:05:59Z

2

Here's one way using data table

library(data.table)
dat <- as.data.table(dat)
which(dat[,name]==dat[x2>20,][which.max(x1),name])

Can do something similar using data frames, but it will be rather more verbose.

which (dat$name==dat$name[which(dat$x2>20)][which.max(dat$x1[which(dat$x2>20)])])

Note that this method depends on the assumption that name contains unique values for each row.

edited May 11, 2016 at 15:05

answered May 11, 2016 at 14:56

dww

31.6k8 gold badges75 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sotos · Accepted Answer · 2016-05-11 14:36:29Z

1

Just use max instead of which.max. However, the whole data frame needs to be sorted based on x1, as max does 1:1 mapping. (Thanks @myk_raniu for clarifying)

dat <- dat[order(dat$x1),]
dat$name[max(dat$x1[dat$x2>20])]
#[1] D

edited May 11, 2016 at 14:36

answered May 11, 2016 at 14:04

Sotos

51.6k6 gold badges35 silver badges69 bronze badges

2 Comments

myk_raniu Over a year ago

This only works because there is a 1:1 mapping between the values of the x1 vector and the indexes. x1= c(1,2,3,4) works, but if you change to x1= c(1,2,4,3) it breaks and still gives you D

CompSocialSciR Over a year ago

OK, I can accept the answer as it works in the example I provided. Now would you have any clue why in my (real) case it returns all the names instead of only the one corresponding to the maximum? Dropping dat$name it gives a single value, adding it returns all the names.

myk_raniu · Accepted Answer · 2016-05-11 14:43:22Z

1

The reason which.max doesn't give the right answer is that the filtered list of x1 is shorter than the dat$name list and there is no longer a 1:1 correspondance

Try this instead

dat <- data.frame(name= c("A","B","C","D"),
                  x1= c(1,2,3,4),
                  x2= c(10,20,30,40))

dat$name[dat$x1==max(dat$x1[dat$x2>20])]

edited May 11, 2016 at 14:43

answered May 11, 2016 at 14:16

myk_raniu

1501 gold badge1 silver badge7 bronze badges

3 Comments

CompSocialSciR Over a year ago

This would work fine, then I will have to join using the name as key. So as I understand there is no way to return the index referring to the whole set of observation from within the which() condition? I mean, something that returns the index [4] rather than getting it indirectly through the name?

myk_raniu Over a year ago

updated with a simpler method that does what you are looking for. you can use logical vector indexing by setting TRUE the value that matches the max of the condition

dww Over a year ago

This method fails if there are duplicate values of x1, and the same max(x1) in the subset also occurs in x1 outside the subset. Also, it still returns the name not the index number that OP requested.

Collectives™ on Stack Overflow

R get correct index using which() condition

3 Answers 3

Comments

2 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

3 Comments

Related