2

I have to find an observation satisfying some criteria and then merge this indices with an other dataset. So I don't need the index of the observations satisfying the condition, but the index that refers to all the observations.

For instance, I want to find the max(x1) given that x2>20 and then use this index in another dataset later. I need the right index, in other words:

dat <- data.frame(name= c("A","B","C","D"),
           x1= c(1,2,3,4),
           x2= c(10,20,30,40))
dat$name[which.max(dat$x1[dat$x2>20])]
[1] B

I want to get

[1] D

i.e. an index of 4, not 2.

0

3 Answers 3

2

Here's one way using data table

library(data.table)
dat <- as.data.table(dat)
which(dat[,name]==dat[x2>20,][which.max(x1),name])

Can do something similar using data frames, but it will be rather more verbose.

which (dat$name==dat$name[which(dat$x2>20)][which.max(dat$x1[which(dat$x2>20)])])

Note that this method depends on the assumption that name contains unique values for each row.

Sign up to request clarification or add additional context in comments.

Comments

1

Just use max instead of which.max. However, the whole data frame needs to be sorted based on x1, as max does 1:1 mapping. (Thanks @myk_raniu for clarifying)

dat <- dat[order(dat$x1),]
dat$name[max(dat$x1[dat$x2>20])]
#[1] D

2 Comments

This only works because there is a 1:1 mapping between the values of the x1 vector and the indexes. x1= c(1,2,3,4) works, but if you change to x1= c(1,2,4,3) it breaks and still gives you D
OK, I can accept the answer as it works in the example I provided. Now would you have any clue why in my (real) case it returns all the names instead of only the one corresponding to the maximum? Dropping dat$name it gives a single value, adding it returns all the names.
1

The reason which.max doesn't give the right answer is that the filtered list of x1 is shorter than the dat$name list and there is no longer a 1:1 correspondance

Try this instead

dat <- data.frame(name= c("A","B","C","D"),
                  x1= c(1,2,3,4),
                  x2= c(10,20,30,40))

dat$name[dat$x1==max(dat$x1[dat$x2>20])]

3 Comments

This would work fine, then I will have to join using the name as key. So as I understand there is no way to return the index referring to the whole set of observation from within the which() condition? I mean, something that returns the index [4] rather than getting it indirectly through the name?
updated with a simpler method that does what you are looking for. you can use logical vector indexing by setting TRUE the value that matches the max of the condition
This method fails if there are duplicate values of x1, and the same max(x1) in the subset also occurs in x1 outside the subset. Also, it still returns the name not the index number that OP requested.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.