5

I have the following lists:

  group1<-c("A", "B", "D")
  group2<-c("C", "E")
  group3<-c("F")

and a dataframe with values and corresponding names:

  df <- data.frame (name=c("A","B","C","D","E","F"),value=c(1,2,3,4,5,6))
  df
    name value
  1    A     1
  2    B     2
  3    C     3
  4    D     4
  5    E     5
  6    F     6

I'd like to group the data based on the lists, using the name column;

  df
    name value    group
  1    A     1   group1
  2    B     2   group1
  3    C     3   group2
  4    D     4   group1
  5    E     5   group2
  6    F     6   group3

and sum the values for each group.

  df
       group sum
  1   group1   7
  2   group2   8
  3   group3   6

I've searched for similar posts, but failed using them for my problem.

3 Answers 3

3

Here's an approach. First, use ifelse to assign groups to each name, then use aggregate to get the sum for each group.

> df$group <- with(df, ifelse(name %in% group1, "group1",
                              ifelse(name %in% group2, "group2", "group3" )))
> aggregate(value ~ group, sum, data=df)
   group value
1 group1     7
2 group2     8
3 group3     6
Sign up to request clarification or add additional context in comments.

2 Comments

One more question then, let's say the initial df contains multiple value columns (value1, value2, value2), what's the best way to apply aggregate to all the columns.
try aggregate(cbind(value1,value2,...,valueN) ~ group, sum, data=df)
1

I would suggest having your grouping as a data.frame, something along these lines -

grouping <- data.frame(name=c("A","B","C","D","E","F"),groupno=c(1,1,1,2,2,3))
df2 <- merge(df,grouping, by = 'name')
aggregate(value ~ groupno, sum, data=df2)

Comments

1

Another idea:

df$X <- factor(df$name)
levels(df$X) <- list(group1 = group1, group2 = group2, group3 = group3)
aggregate(df$value, list(group = df$X), sum)
#   group x
#1 group1 7
#2 group2 8
#3 group3 6

EDIT

As noted by @thelatemail in the comments below you can mget -in a list- all the objects in your workspace called "group_", like this:

mget(ls(pattern="group\\d+"))

In case, though, you have loaded -say- a function called "group4", this function will be selected too in ls(). A way to avoid this is to use something like:

.ls <- ls(pattern="group\\d+")
mget(.ls[!.ls %in% apropos("group", mode = "function")])  #`mget` only non-functions.
                                                      #You can, of course, avoid any 
                                                     #other `mode`, besides "function".

The list returned from mget can, then, be used as the levels(df$X).

2 Comments

I think this is the most R-ish way to approach the problem and is very concise. The requirement to construct the named list could be avoided with something like: levels(df$X) <- mget(ls(pattern="group\\d+"))
@thelatemail: Ha, exactly what I was thinking! But when I used pattern = "group", a function of mine that was loaded, called groupby, was returned too, and then I didn't spend more time on it. Your idea, though, of grepping "group" followed by a digit is the best option. I'll edit my answer with a workaround I made, but feel free to edit if you have something better to add. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.