Using aggregate() on data.frame objects

Question

Why aggregate() doesn't work here?

> aggregate(cbind(var1 = 1:10, var2 = 101:110), 
      by=list(range=cut(1:10, breaks=c(2,4,8,10))), 
      FUN = function(x) 
        { 
        c(obs=length(x[, "var2"]), avg=mean(x[, "var2"]), sd=dev(x[, "var2"])) 
        })

Error in x[, "var2"] (from #1) : incorrect number of dimensions

> cbind(var1 = 1:10, var2 = 101:110)[, "var2"]
 [1] 101 102 103 104 105 106 107 108 109 110

UPDATE

Returned aggregate() values after running the correct version:

> r = aggregate(data.frame(var1 = 1:10, var2 = 101:110), by=list(range=cut(1:10, breaks=c(2,4,8,10))), FUN = function(x) { c(obs=length(x), avg=mean(x), sd=sd(x)) })
> class(r)
[1] "data.frame"
> dim(r)
[1] 3 3
> r[,1]
[1] (2,4]  (4,8]  (8,10]
Levels: (2,4] (4,8] (8,10]
> r[,2]
     obs avg       sd
[1,]   2 3.5 0.707107
[2,]   4 6.5 1.290994
[3,]   2 9.5 0.707107
> r[,3]
     obs   avg       sd
[1,]   2 103.5 0.707107
[2,]   4 106.5 1.290994
[3,]   2 109.5 0.707107
> class(r[,2])
[1] "matrix"
> class(r[,3])
[1] "matrix"

cbind with numeric arguments returns a matrix, not a a dataframe. And you would not expect to specify column names inside the anonymous function supplied to FUN. — IRTFM
– IRTFM, Commented Apr 27, 2015 at 19:13

IRTFM · Accepted Answer · 2015-04-27 19:19:01Z

3

Supply a dataframe and understand that aggregate passes only column vectors so using x[ , "colname"] is doomed because "x" is not a dataframe:

 aggregate(data.frame(var1 = 1:10, var2 = 101:110), 
       by=list(range=cut(1:10, breaks=c(2,4,8,10))), 
       FUN = function(x) 
         { 
         c(obs=length(x), avg=mean(x), sd=sd(x)) 
         })
#------------
   range  var1.obs  var1.avg   var1.sd    var2.obs    var2.avg     var2.sd
1  (2,4] 2.0000000 3.5000000 0.7071068   2.0000000 103.5000000   0.7071068
2  (4,8] 4.0000000 6.5000000 1.2909944   4.0000000 106.5000000   1.2909944
3 (8,10] 2.0000000 9.5000000 0.7071068   2.0000000 109.5000000   0.7071068

answered Apr 27, 2015 at 19:19

IRTFM

264k22 gold badges381 silver badges503 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Robert Kubrick Over a year ago

Is "x" a matrix? I wouldn't know how to break into that part of the code to inspect the objects.

IRTFM Over a year ago

"x" would have been a (possibly named) numeric (atomic) vector at the point it/they were being passed to FUN. It would not have had a dimension so it would be neither a matrix nor a dataframe.

Robert Kubrick Over a year ago

Interesting, so how do we end up with a length/mean/sd for each column in the original data.frame object (var1/var2)? If "x" is a simple vector, does it mean FUN is called once for each data.frame column?

IRTFM Over a year ago

FUN is called as many times as there are categories in the INDEX (rather by) argument for each column. That the entire reason for aggregate's existence. So FUN is called length(dfrm) * length(unique(by-vector)) times.

IRTFM Over a year ago

aggregate returns a first label column as the vector of unique (sorted) values in the by-argument and then basically rbinds the values from the multiple calls to FUN for each column. The function doing the actual "rbinding" is sapply.

|

MrFlick · Accepted Answer · 2015-04-27 19:14:04Z

3

That's because aggregate doesn't pass data.frames to its FUN= argument. It passes the vector of observations. Also, [, "name"] indexing doesn't work with matrices. Make sure you pass in a data.frame and not a matrix as in your example. Perhaps you want the by function instead

by(data.frame(var1 = 1:10, var2 = 101:110), 
    list(range=cut(1:10, breaks=c(2,4,8,10))), 
    FUN = function(x) { c(obs=length(x[, "var2"]), avg=mean(x[, "var2"]), sd=sd(x[, "var2"])) })

answered Apr 27, 2015 at 19:14

MrFlick

209k19 gold badges300 silver badges324 bronze badges

2 Comments

Robert Kubrick Over a year ago

I checked the aggregate code, it converts a matrix parameter to a data.frame if it's not a time series object. Which "vector of observations" FUN takes exactly?

MrFlick Over a year ago

It passes in columns as vectors. It only ever operates on one column at a time.

Collectives™ on Stack Overflow

Using aggregate() on data.frame objects

2 Answers 2

6 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

2 Comments

Related