2

this seems to be rather easy, but it keeps my busy since a while.

I have a dataframe (df) with n columns and a vector with the same number (n) of values.

The values in the vector are thresholds for the observations in the columns in the dataframe. So the clue is, how to tell R to use different thresholds for each column?

I want to keep all the observations in the dataframe which fulfill the various thresholds for each column (above or below, doesnt matter in the example). The observations which do not fulfill the threshold criterion should be set to 0.

I dont want a subset of the dataframe.

Can anyone help? Thanks a lot in advance.

3 Answers 3

5

Given some example data and thresholds

set.seed(42)
dat <- data.frame(matrix(runif(100), ncol = 10))

## thresholds
thresh <- seq(0.5, 0.95, length.out = 10)
thresh

we can use the mapply() function to work out which observations in each column (in this) are greater than or equal to the threshold. Using those indices, we can replace the values corresponding to the indices with 0 via:

dat[mapply(">=", dat, thresh)] <- 0

Here is the call in action:

> dat
          X1        X2         X3          X4         X5
1  0.9148060 0.4577418 0.90403139 0.737595618 0.37955924
2  0.9370754 0.7191123 0.13871017 0.811055141 0.43577158
3  0.2861395 0.9346722 0.98889173 0.388108283 0.03743103
4  0.8304476 0.2554288 0.94666823 0.685169729 0.97353991
5  0.6417455 0.4622928 0.08243756 0.003948339 0.43175125
6  0.5190959 0.9400145 0.51421178 0.832916080 0.95757660
7  0.7365883 0.9782264 0.39020347 0.007334147 0.88775491
8  0.1346666 0.1174874 0.90573813 0.207658973 0.63997877
9  0.6569923 0.4749971 0.44696963 0.906601408 0.97096661
10 0.7050648 0.5603327 0.83600426 0.611778643 0.61883821
           X6        X7          X8         X9          X10
1  0.33342721 0.6756073 0.042988796 0.58160400 0.6674265147
2  0.34674825 0.9828172 0.140479094 0.15790521 0.0002388966
3  0.39848541 0.7595443 0.216385415 0.35902831 0.2085699569
4  0.78469278 0.5664884 0.479398564 0.64563188 0.9330341273
5  0.03893649 0.8496897 0.197410342 0.77582336 0.9256447486
6  0.74879539 0.1894739 0.719355838 0.56364684 0.7340943010
7  0.67727683 0.2712866 0.007884739 0.23370340 0.3330719834
8  0.17126433 0.8281585 0.375489965 0.08998052 0.5150633298
9  0.26108796 0.6932048 0.514407708 0.08561206 0.7439746463
10 0.51441293 0.2405447 0.001570554 0.30521837 0.6191592400
> dat[mapply(">=", dat, thresh)] <- 0
> dat
          X1        X2         X3          X4         X5
1  0.0000000 0.4577418 0.00000000 0.000000000 0.37955924
2  0.0000000 0.0000000 0.13871017 0.000000000 0.43577158
3  0.2861395 0.0000000 0.00000000 0.388108283 0.03743103
4  0.0000000 0.2554288 0.00000000 0.000000000 0.00000000
5  0.0000000 0.4622928 0.08243756 0.003948339 0.43175125
6  0.0000000 0.0000000 0.51421178 0.000000000 0.00000000
7  0.0000000 0.0000000 0.39020347 0.007334147 0.00000000
8  0.1346666 0.1174874 0.00000000 0.207658973 0.63997877
9  0.0000000 0.4749971 0.44696963 0.000000000 0.00000000
10 0.0000000 0.0000000 0.00000000 0.611778643 0.61883821
           X6        X7          X8         X9          X10
1  0.33342721 0.6756073 0.042988796 0.58160400 0.6674265147
2  0.34674825 0.0000000 0.140479094 0.15790521 0.0002388966
3  0.39848541 0.7595443 0.216385415 0.35902831 0.2085699569
4  0.00000000 0.5664884 0.479398564 0.64563188 0.9330341273
5  0.03893649 0.0000000 0.197410342 0.77582336 0.9256447486
6  0.74879539 0.1894739 0.719355838 0.56364684 0.7340943010
7  0.67727683 0.2712866 0.007884739 0.23370340 0.3330719834
8  0.17126433 0.0000000 0.375489965 0.08998052 0.5150633298
9  0.26108796 0.6932048 0.514407708 0.08561206 0.7439746463
10 0.51441293 0.2405447 0.001570554 0.30521837 0.6191592400

It is instructive to notice what mapply() returns in this case:

> mapply(">=", dat, thresh)
         X1    X2    X3    X4    X5    X6    X7    X8    X9   X10
 [1,]  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [2,]  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE
 [3,] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [4,]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [5,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 [6,]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [7,]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 [9,]  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
[10,]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

and it is those logical values that are used to select the observations that meet the threshold. You can a different binary operator to the one I used; see ?">" for the various options. When writing the mapply() call, think of it in terms of left-hand-side and right-hand-side of the binary operator, such that an mapply() call would give:

mapply(">", lhs, rhs)

where we might write

lhs > rhs

Update: As @DWin has answered the comment about two thresholds I will update my Answer to match.

thresh1 <- seq(0.05, 0.5, length.out = 10)
thresh2 <- seq(0.55, 0.95, length.out = 10)
set.seed(42)
dat <- data.frame(matrix(runif(100), ncol = 10))

l1 <- mapply(">", dat, thresh1)
l2 <- mapply("<", dat, thresh2)

We can see which elements match both constraints:

> l1 & l2
         X1    X2    X3    X4    X5    X6    X7    X8    X9   X10
 [1,] FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
 [2,] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [3,]  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE
 [4,] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
 [5,] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE
 [6,]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
 [7,] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [8,]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE
 [9,] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE
[10,] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE

and the same construct can be used to select those elements that match:

dat[l1 & l2] <- 0
dat

> dat
          X1        X2         X3          X4         X5         X6        X7          X8
1  0.9148060 0.0000000 0.90403139 0.737595618 0.00000000 0.00000000 0.0000000 0.042988796
2  0.9370754 0.7191123 0.13871017 0.811055141 0.00000000 0.00000000 0.9828172 0.140479094
3  0.0000000 0.9346722 0.98889173 0.000000000 0.03743103 0.00000000 0.0000000 0.216385415
4  0.8304476 0.0000000 0.94666823 0.685169729 0.97353991 0.78469278 0.0000000 0.000000000
5  0.6417455 0.0000000 0.08243756 0.003948339 0.00000000 0.03893649 0.8496897 0.197410342
6  0.0000000 0.9400145 0.00000000 0.832916080 0.95757660 0.00000000 0.1894739 0.000000000
7  0.7365883 0.9782264 0.00000000 0.007334147 0.88775491 0.00000000 0.2712866 0.007884739
8  0.0000000 0.0000000 0.90573813 0.000000000 0.00000000 0.17126433 0.8281585 0.375489965
9  0.6569923 0.0000000 0.00000000 0.906601408 0.97096661 0.26108796 0.0000000 0.000000000
10 0.7050648 0.0000000 0.83600426 0.000000000 0.00000000 0.00000000 0.2405447 0.001570554
           X9          X10
1  0.00000000 0.0000000000
2  0.15790521 0.0002388966
3  0.35902831 0.2085699569
4  0.00000000 0.0000000000
5  0.00000000 0.0000000000
6  0.00000000 0.0000000000
7  0.23370340 0.3330719834
8  0.08998052 0.0000000000
9  0.08561206 0.0000000000
10 0.30521837 0.0000000000
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much!! I see mapply does exactly what I want!
OK. This works pretty well. Here it comes a bit more tricky (to me at least): Instead of one vector with thresholds, I have two vectors. One for the upper threshold, one for the lower threshold. The remaining data should be in between the two thresholds.
@mitchbu In that case, do the mapply() once with ">" and again with "<" supplying one threshold to an mapply() call. That will give you two logical matrices, which you can combine with &: sqy you have the upper threshold logical in upr and the lower threshold logical in lwr then you could do dat[lwr & upr]. If that is not clear enough, post a new Q to explain the problem and a reproducible example and we can take a look.
@mitchbu Rising to DWin's challenge, I've updated my answer in light of the comment about using two thresholds.
:thanks a lot for your update, this does the job exactly the way I need it. You've also helped me with understanding roughly the concept of mapply, your post is very didactic.
2

I like Gavin's answer better than mine, but here's a slightly different application of mapply using his data:

mapply(function(x,tt) ifelse(x >= tt, 0, x), dat, thresh)

In light of your second comment: my construction might be more generalizable than Gavin's

Two threshold vectors:

mapply(function(x, lt, ht) ifelse(x <= lt | x >= ht , 0, x), dat, lothresh, hithresh)

1 Comment

thank you very much for your solution, too. I will add it as a comment into my code for future use. However, currently I am more fond of Gavin's version, because I seem to learn better how mapply works. However, I do like one-liners.
0

Not sure how it's going to work with data frames, but the following worked with matrices:
You can get a boolean representation of df under the given condition and then use it as indexing of df to set the values. Alternatively you can get a vector with indexes of the matching fields and use it as index vector to set the values. Hope that helps.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.