0

Suppose I have a data.frame such as:

df = data.frame(id = c("a","b","c","d","e"), tid = rep("t",5), gid = c("A","B","C","D","E"), V1 = c("11","11","11","00","11"), V2 = c("11","01","11","01","01"), V3 = c("11","11","11","10","11"))

and I would like to aggregate rows that are identical between columns 4-6 (all columns but the first three). The first three column fields which correspond to aggregated rows should be the concatenation (comma separated) of their original values.

So for my example this would be the resulting data,frame:

> df
   id tid gid V1 V2 V3
1 a,c   t A,C 11 11 11
2 b,e   t B,E 11 01 11
3   d   t   D 00 01 10

What's the simplest/fastest way to achieve this?

0

2 Answers 2

3

If you want to collapse a vector of values into a comma separated list, the best function for the job is paste() and if you combine that with the base aggregate() function, you get

aggregate(id~., df, paste,collapse=",")

which returns your desired output.

With the edited version of your question, you can use

aggregate(as.matrix(cbind.data.frame(id,tid,gid))~., df, paste,collapse=",")

If the columns you wanted to aggregate were character rather than factor, you could have just done

aggregate(cbind(id,tid,gid)~., df, paste,collapse=",")
Sign up to request clarification or add additional context in comments.

3 Comments

aggregate(id ~., df, as.character) would also work.
I edited my question to a slightly more complicated case, if that's ok?
Normally it's not very polite to change your question after you ask it (clarifications are OK). Sometimes changing the question may lead to a completely different strategy and then all the time spent arriving at the first solution was wasted. So make sure your example reflects your actual needs as best as possible from the start.
1

You mentioned "efficient" in your question. I would then suggest looking at data.table. Also, it's not clear whether you need unique or not, so I've shown my answer with unique since it matches your desired output:

library(data.table)
setDT(df)[, lapply(.SD, function(x) paste(unique(x), collapse = ",")), 
          by = list(V1, V2, V3)]
#    V1 V2 V3  id tid gid
# 1: 11 11 11 a,c   t A,C
# 2: 11 01 11 b,e   t B,E
# 3: 00 01 10   d   t   D

Note that the result is a data.table and that your original data.frame has also been converted to a data.table.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.