I want to summarise a df by group using multiple functions. Replication data:
library(dplyr)
df1 <- data.frame(a=c('a', 'a', 'b', 'b', 'c', 'c'), b=c(1,NA,3,2,2,1), c=c(1,3,5,5,2,4))
One of these is a custom function that asks for the value of df1$b when max(df1$c) in each group (df1$a). When the result is NA, it should return the value for df1$b for the second-highest value of df1$c. The following works:
namax <- function(x,y) ifelse(is.na(y[x==max(x)] & length(x)>1),
y[x==sort(x,partial=length(x)-1)[length(x)-1]], y[x==max(x)])
I then try to summarise df1 using:
df2 <- df1 %>%
dplyr::group_by(a) %>%
summarise(meanc = mean(c),
maxc = namax(c,b))
Which returns the following, because for df$a == 'b' the max value of df1$c occurs twice for different values of df1$b.
Error: Column 'maxc' must be length 1 (a summary value), not 2
Is there an elegant solution through which dplyr returns both values, while simultaneously executing the other call to summarise() (e.g. by adding do() to the call to group_by)? In my applied case I am trying to run several different calls to summarise, aside from the one using the namax function.