1

I'm actually having a trouble with a particular task of my code. I have a data frame as

n  <- 6
set.seed(123)
df <- data.frame(x=paste0("x",seq_along(1:n)), A=sample(c(-2:2),n,replace=TRUE), B=sample(c(-1:3),n,replace=TRUE))
#
#    x  A B
# 1 x1 -1 1
# 2 x2  1 3
# 3 x3  0 1
# 4 x4  2 1
# 5 x5  2 3
# 6 x6 -2 1

and a decision tree as

A>0;Y;Y;N;N
B>1;Y;N;Y;N
C;1;2;2;1

that I load by

dt <- read.csv2("tmp.csv", header=FALSE)

I'd like to create a loop for all the possible combinations of (A>0) and (B>1) and set the C value to the subset x column that satisfy that condition. So, here's what I did

nr <- 3
nc <- 5

cond <- dt[1:(nr-1),1,drop=FALSE]
rule <- dt[nr,1,drop=FALSE]

subdf <- vector(mode="list",2^(nr-1))

for (i in 2:nc) {
  check <- paste0("")
  for (j in 1:(nr-1)) {
    case <- paste0(dt[j,1])
    if (dt[j,i]=="N")
      case <- paste0("!",case)
    check <- paste0(check, "(", case, ")" )

    if (j<(nr-1))
      check <- paste0(check, "&")

  }

  subdf[i]   <- subset(df,check)
  subdf[i]$C <- dt[nr,i]

}
unlist(subdf)

unfortunately, I got an error using subset as by this, it cannot parse the conditions from my string statements. what should I do?

4
  • Will the problem be larger than this, or are you only checking a and b? Commented Dec 14, 2015 at 11:51
  • Yes, the problem is larger than this, but I wanted to simplify it and generalize it for other people. was that wrong? Commented Dec 14, 2015 at 12:30
  • I was just checking. You're not wrong at all (you're correct), so a solution should be generalizable to an arbitrary number of rules. Commented Dec 14, 2015 at 12:32
  • The big issue is the subset step. Adding further rules would add just other columns in the subdf[i] Commented Dec 14, 2015 at 12:35

1 Answer 1

1

Your issue is your creating of the subset: the subset commands expects a boolean and you gave it a string. ('check'). So the simplest solution here is to add a 'parse'. I feel there is a more elegant way to solve this problem and I hope someone'll come along and do it, but you can fix the final part of your code with the following

 mysubset  <- subset(df,with(df,eval(parse(text=check))))
  if(nrow(mysubset)>0){
    mysubset$C <-  dt[nr,i]
  } 
  subdf[[i]]<-mysubset

I have added the parse/eval part to generate a vector of booleans to subset only the 'TRUE' cases, and added a check for whether C could be added (will give error if there are no rows).

Based on the previous answer, I came up with a more elegant/practical way of generating a vector of combined rules, and then applying them all to the data, using apply/lapply.

##create list of formatted rules

#format each 'building' block separately, 
#based on rows in 'dt'.
part_conditions <- apply(dt[-nrow(dt),],MARGIN=1,FUN=function(x){
  res <- sprintf("(%s%s)", ifelse(x[-1]=="Y","","!"), x[1])
})

# > part_conditions
# 1        2       
# [1,] "(A>0)"  "(B>1)" 
# [2,] "(A>0)"  "(!B>1)"
# [3,] "(!A>0)" "(B>1)" 
# [4,] "(!A>0)" "(!B>1)"

#combine to vector of conditions
conditions <- apply(part_conditions, MARGIN=1,FUN=paste, collapse="&")

# > conditions
# [1] "(A>0)&(B>1)"   "(A>0)&(!B>1)"  "(!A>0)&(B>1)"  "(!A>0)&(!B>1)"

#for each condition, test in data wheter condition is 'T'
temp <- sapply(conditions, function(rule){
  return(with(df, eval(parse(text=rule))))
}
)


rules <- as.numeric(t(dt[nrow(dt),-1]))

#then find which of the (in this case) four is 'T', and put the appropriate rule
#in df
df$C <- rules[apply(temp,1,which)]
> df
   x  A B C
1 x1 -1 1 1
2 x2  1 3 1
3 x3  0 1 1
4 x4  2 1 2
5 x5  2 3 1
6 x6 -2 1 1
Sign up to request clarification or add additional context in comments.

5 Comments

exactely what I was looking for!!! also the control for no row subsets. I really don't know how to thank you! then yes, if there were a more elegant solution for the parsing command, it would be great, but it is enough so far. I'd just add finaldf <- data.frame(x=c(),A=c(),B=c(),C=c()) out of the loop and replace subdf[[i]]<-mysubset by finaldf <- rbind(finaldf,mysubset) inside the if statement.
I have an idea for a more elegant solution, will work on it later today
@Stefano done, a more efficient version of your creating of rules inside the for-loop.
thanks a lot!! it sounds super elegant! and out of my skills... :) I gonna adapt for my real case. thanks again
@Stefano you're welcome. Trying to make these things elegant/less error-prone as you don't use for-loops and generate things more automatically are a fast way to grow your skills and to make you able to tackle different problems. I'm guessing my solution works with hundreds of conditions, which will be very hard to program in your loop.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.