Given a dataframe ex:
a <- c(1:3,4:6)
b <- c(2:4,3,2,1)
c <- cbind(a,b)
i would like to subset dataframe by removing rows with similar comparison (ex: row3: 3,4 is same as row4: 4,3) and have only one of them.
Assuming d is your matrix, not c:
e <- unique(apply(d,1,function(x) paste(sort(x),collapse="~")))
> t(sapply(strsplit(e,"~"),as.numeric))
[,1] [,2]
[1,] 1 2
[2,] 2 3
[3,] 3 4
[4,] 2 5
[5,] 1 6
Breaking it down:
First line
apply(d,1,function(x) ... ) takes each row of d and passes it as a vector x to the anonymous function whose body I've called ... here.
The function body is paste(sort(x),collapse="~"), which sorts the vector and then turns it into a length-one vector with each element separated by a ~.
So the apply call overall is going to return a character vector where each element used to be a row of the matrix.
Then unique keeps only the unique elements. The sorting ensures that this does what we want it to.
Second line
strsplit(e,"~") splits our character vector back into a separated form. In this case, it's a list where each element is a character vector of the numbers that comprise each row.
sapply(...,as.numeric) applies as.numeric() to each element of the list. So we convert the character vector back to a numeric vector. Since the s in sapply stands for "simplify," it will create a matrix from this.
But it's the wrong direction (2x5 instead of 5x2)! t() transposes the matrix to the original form.
!duplicated instead of unique (because then you can use the logical vector to select out of the original matrix). I believe this is what @dayne's solution now does.
cis a function in R and should never be used as a variable name.dfis also a function name, as isdataand many other names that are commonly used as variables. To my knowledge, all the answers here would work whether the object is named "c" or your other favorite letter of the alphabet. That said, Ram should heed your warning if only because by using "c", he will also be making his code less readable.