0

I want to select certain values from multiple columns using conditions.(also let assign row 1 as ID#1, ... row5 as ID#5)

column1 <- c("rice 2", "apple 4", "melon 6", "blueberry 4", "orange 6")
column2 <- c("rice 8", "blueberry 8", "grape 10", "water 10", "mango 3")
column3 <- c("rice 6", "apple 8", "blueberry 12", "pineapple 8", "mango 3")

I want to get new column using IDs with condition only rice > 5, blueberry > 7 or orange > 5

First, I would like to get ID#1, ID#2, ID#3, ID#5

Second, I would to count how many conditions met per ID I would like to get results

ID#1 -> 2 conditions met
ID#2 -> 1 conditions met
ID#3 -> 1 conditions met
ID#4 -> 0 conditions met
ID#5 -> 1 conditions met
6
  • 1
    The columns don't have the same length? It is also not clear about the conditions Commented May 4, 2018 at 3:09
  • 1
    Putting aside the issue of column3 not having 5 values, I think you need to rearrange your data a bit first. Trying to work with values like "rice 2" "group value" won't allow you to do simple comparisons. I'd try splitting the number into a separate column if you can. Commented May 4, 2018 at 3:22
  • Hi @akrun all the columns do have the same length. Basically, I want to extract data which met criteria inside columns by IDs. I apologize for errors. all columns have the same length Commented May 4, 2018 at 3:40
  • @thelatemail How about I separate the values to 2 different columns. column1 has all name and column2 has all values. Please help me. Thanks a millions Commented May 4, 2018 at 3:43
  • 1
    Please read Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers? - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. Commented May 4, 2018 at 17:26

1 Answer 1

1

If I understood the question correctly then one of the approach could be

library(dplyr)

cols <- names(df)[-1]

df1 <- df %>%
  mutate_if(is.factor, as.character) %>%
  mutate(rice_gt_5 = (select(., one_of(cols)) %>% 
                        rowwise() %>%
                        mutate_all(funs(strsplit(., split=" ")[[1]][1] =='rice' & as.numeric(strsplit(., split=" ")[[1]][2]) > 5)) %>%
                        rowSums)) %>%
  mutate(blueberry_gt_7 = (select(., one_of(cols)) %>% 
                        rowwise() %>%
                        mutate_all(funs(strsplit(., split=" ")[[1]][1] =='blueberry' & as.numeric(strsplit(., split=" ")[[1]][2]) > 7)) %>%
                        rowSums)) %>%
  mutate(orange_gt_5 = (select(., one_of(cols)) %>% 
                        rowwise() %>%
                        mutate_all(funs(strsplit(., split=" ")[[1]][1] =='orange' & as.numeric(strsplit(., split=" ")[[1]][2]) > 5)) %>%
                        rowSums))

#IDs which satisfy at least one of your conditions i.e. rice > 5 OR blueberry > 7 OR orange > 5
df1$ID[which(df1 %>% select(rice_gt_5, blueberry_gt_7, orange_gt_5) %>% rowSums() >0)]
#[1] 1 2 3 5

#How many conditions are met per ID
df1 %>%
  mutate(no_of_cond_met = rowSums(select(., one_of(c("rice_gt_5", "blueberry_gt_7", "orange_gt_5"))))) %>%
  select(ID, no_of_cond_met)
#  ID no_of_cond_met
#1  1              2
#2  2              1
#3  3              1
#4  4              0
#5  5              1

Sample data:

df <- structure(list(ID = 1:5, column1 = structure(c(5L, 1L, 3L, 2L, 
4L), .Label = c("apple 4", "blueberry 4", "melon 6", "orange 6", 
"rice 2"), class = "factor"), column2 = structure(c(4L, 1L, 2L, 
5L, 3L), .Label = c("blueberry 8", "grape 10", "mango 3", "rice 8", 
"water 10"), class = "factor"), column3 = structure(c(5L, 1L, 
2L, 4L, 3L), .Label = c("apple 8", "blueberry 12", "mango 3", 
"pineapple 8", "rice 6"), class = "factor")), .Names = c("ID", 
"column1", "column2", "column3"), row.names = c(NA, -5L), class = "data.frame")
Sign up to request clarification or add additional context in comments.

3 Comments

sorry. Let me try this solution and will let you know
not sure what happened I got this message Error in mutate_impl(.data, dots) : Evaluation error: Evaluation error: non-character argument..
can you share dput(head(df))?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.