Group variables in a dataframe R using a specific list

Question

I have the following lists:

  group1<-c("A", "B", "D")
  group2<-c("C", "E")
  group3<-c("F")

and a dataframe with values and corresponding names:

  df <- data.frame (name=c("A","B","C","D","E","F"),value=c(1,2,3,4,5,6))
  df
    name value
  1    A     1
  2    B     2
  3    C     3
  4    D     4
  5    E     5
  6    F     6

I'd like to group the data based on the lists, using the name column;

  df
    name value    group
  1    A     1   group1
  2    B     2   group1
  3    C     3   group2
  4    D     4   group1
  5    E     5   group2
  6    F     6   group3

and sum the values for each group.

  df
       group sum
  1   group1   7
  2   group2   8
  3   group3   6

I've searched for similar posts, but failed using them for my problem.

Jilber Urbina · Accepted Answer · 2013-11-24 16:10:51Z

3

Here's an approach. First, use ifelse to assign groups to each name, then use aggregate to get the sum for each group.

> df$group <- with(df, ifelse(name %in% group1, "group1",
                              ifelse(name %in% group2, "group2", "group3" )))
> aggregate(value ~ group, sum, data=df)
   group value
1 group1     7
2 group2     8
3 group3     6

answered Nov 24, 2013 at 16:10

Jilber Urbina

61.4k10 gold badges116 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2904120 Over a year ago

One more question then, let's say the initial df contains multiple value columns (value1, value2, value2), what's the best way to apply aggregate to all the columns.

Jilber Urbina Over a year ago

try aggregate(cbind(value1,value2,...,valueN) ~ group, sum, data=df)

TheComeOnMan · Accepted Answer · 2013-11-24 16:15:39Z

1

I would suggest having your grouping as a data.frame, something along these lines -

grouping <- data.frame(name=c("A","B","C","D","E","F"),groupno=c(1,1,1,2,2,3))
df2 <- merge(df,grouping, by = 'name')
aggregate(value ~ groupno, sum, data=df2)

answered Nov 24, 2013 at 16:15

TheComeOnMan

12.9k9 gold badges42 silver badges55 bronze badges

Comments

alexis_laz · Accepted Answer · 2013-11-25 10:44:44Z

Another idea:

df$X <- factor(df$name)
levels(df$X) <- list(group1 = group1, group2 = group2, group3 = group3)
aggregate(df$value, list(group = df$X), sum)
#   group x
#1 group1 7
#2 group2 8
#3 group3 6

EDIT

As noted by @thelatemail in the comments below you can mget -in a list- all the objects in your workspace called "group_", like this:

mget(ls(pattern="group\\d+"))

In case, though, you have loaded -say- a function called "group4", this function will be selected too in ls(). A way to avoid this is to use something like:

.ls <- ls(pattern="group\\d+")
mget(.ls[!.ls %in% apropos("group", mode = "function")])  #`mget` only non-functions.
                                                      #You can, of course, avoid any 
                                                     #other `mode`, besides "function".

The list returned from mget can, then, be used as the levels(df$X).

I think this is the most R-ish way to approach the problem and is very concise. The requirement to construct the named list could be avoided with something like: levels(df$X) <- mget(ls(pattern="group\\d+"))
@thelatemail: Ha, exactly what I was thinking! But when I used pattern = "group", a function of mine that was loaded, called groupby, was returned too, and then I didn't spend more time on it. Your idea, though, of grepping "group" followed by a digit is the best option. I'll edit my answer with a workaround I made, but feel free to edit if you have something better to add. Thanks!

Collectives™ on Stack Overflow

Group variables in a dataframe R using a specific list

3 Answers 3

2 Comments

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Related