1

I have some semi-complex filtering I need to do:

Identifier <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
item1 <- c("a", "b", "c", "a", "b", "c", "d", "a", "b", "d", "b", "a", "c")
item2 <- c("x", "y", "z", "z", "x", "y", "z", "y", "z", "x", "y", "x", "y")
item3 <- c("p", "q", "r", "p", "q", "r", "p", "q", "r", "p", "q", "r", "p")
df1 <- data.frame(Identifier, item1, item2, item3)
df1

header <- c("Identifier","item1","item2","item3")
values <- c("1","b","y","p")
needed<- c("yes","yes","yes","no")
df2 <- data.frame(header, values, needed)
df2

I then want to use df2, to apply multiple filters on df1. So based on df2, I want to:

  • Filter for "1" in df1$Identifier
  • Filter for "b" in df1$item 1
  • Filter for "y" in df1$item 2
  • Remove "p" in df1$item 3

The goal is then to have df2 as an excel csv file, and the user can include what columns they would like filtered, and for what value. So, these would remain dynamic without them needing to edit the R code.

1 Answer 1

0

We may use Map in base R. When we filter (keep or remove those elements, there is a possibility to have different lengths for each columns as showed before

Map(function(x, nm) {
       i1 <- match(nm, df2$header)        
        if(df2$needed[i1] == "yes") x[x == df2$values[i1]] 
       else x[x != df2$values[i1]]} , df1, names(df1))

-output

$Identifier
[1] 1 1 1

$item1
[1] "b" "b" "b" "b"

$item2
[1] "y" "y" "y" "y" "y"

$item3
[1] "q" "r" "q" "r" "q" "r" "q" "r"

If we want to keep the data as data.frame, it may be better to replace those doesn't conform to the logic as NA

library(dplyr)
df1 %>%
   mutate(across(everything(), ~  {
    i1 <- match(cur_column(), df2$header)
   case_when((df2$needed[i1] == "yes" &.x == df2$values[i1])|
      (df2$needed[i1] == "no" & .x != df2$values[i1]) ~ .x )
  }))

-output

   Identifier item1 item2 item3
1           1  <NA>  <NA>  <NA>
2           1     b     y     q
3           1  <NA>  <NA>     r
4          NA  <NA>  <NA>  <NA>
5          NA     b  <NA>     q
6          NA  <NA>     y     r
7          NA  <NA>  <NA>  <NA>
8          NA  <NA>     y     q
9          NA     b  <NA>     r
10         NA  <NA>  <NA>  <NA>
11         NA     b     y     q
12         NA  <NA>  <NA>     r
13         NA  <NA>     y  <NA>

If we need a single value

df1 %>%
   mutate(across(everything(), ~  {
    i1 <- match(cur_column(), df2$header)
   case_when((df2$needed[i1] == "yes" &.x == df2$values[i1])|
      (df2$needed[i1] == "no" & .x != df2$values[i1]) ~ .x )
  })) %>%
   summarise(across(everything(), ~ .x[complete.cases(.x)][1]))

-output

   Identifier item1 item2 item3
1          1     b     y     q
Sign up to request clarification or add additional context in comments.

4 Comments

thank you, I basically want the code to return just 1 record, (happy to not have NAs): 1 b y q
@jkfirewood does the update works for you
hi it does not work, it says one of the fields (not present in df2) "must be length 0 or one, not 36870."
@jkfirewood not sure. I am using your data and all the solutions are working for that data

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.