group_by() in list of data frames

Question

I want to:

merge list out with dataframe df
estimate an lm() model

id <- c(1,2,3,4,5,1,2,3,4,5)
quarter <- c("1","2","1","1","2", "3","1","1","3","3")
month <- c(3,4,2,1,5,7,3,1,8,9)
pred_dif <- c(0.5,0.1,0.15,0.23,0.75,0.6,0.49,0.81,0.37,0.14)

list_1 <- data.frame(id, pred_dif, month)

pred_dif <- c(0.45,0.18,0.35,0.63,0.25,0.63,0.29,0.11,0.17,0.24)

list_2 <- data.frame(id, pred_dif, month)

pred_dif <- c(0.58,0.13,0.55,0.13,0.76,0.3,0.29,0.81,0.27,0.04)

list_3 <- data.frame(id, pred_dif, month)

pred_dif <- c(0.3,0.61,0.18,0.29,0.85,0.76,0.56,0.91,0.48,0.91)

list_4 <- data.frame(id, pred_dif, month)

out <- list(list_1, list_2, list_3, list_4)


pred_second <- c(0.4,0.71,0.28,0.39,0.95,0.86,0.66,0.81,0.58,0.81)
df <- data.frame(id, quarter, pred_second, month)



library(purrr)
library(dplyr)
library(broom)
library(tidyr)
lmout_lst <- map(out, 
                 ~ left_join(.x, df, by = c('id', 'month')) %>%
                   group_by(quarter) %>%
                   summarise(new = list(lm(pred_dif ~ as.factor(month) - 1) %>% 
                                          broom::tidy(.))) %>%
                   unnest(new))

The problem happens in ols_list_reg. In particular with the "group_by" command.

Any idea why this is happening and possible solutions?

Thank you @Ronak The code doesnt really fail if you try now out[[1]] %>% filter(quarter == '1') %>% {lm(pred_dif ~ as.factor(month) - 1, data = .)} — vog
– vog, Commented Jun 15, 2021 at 0:19

Ronak Shah · Accepted Answer · 2021-06-16 03:56:19Z

1

Perhaps, you can try this -

library(tidyverse)

map(out, 
    ~ left_join(.x, df, by = c('id', 'month')) %>%
      group_by(quarter) %>%
      summarise(new = list({
            tryCatch(lm(pred_dif ~ as.factor(month) - 1) %>% broom::tidy(.), 
                     error = function(e) tibble(estimate = NA))
        })) %>%
      unnest(new)
)

If you want to combine all the results together use map_df instead of map.

edited Jun 16, 2021 at 3:56

answered Jun 15, 2021 at 3:02

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

vog Over a year ago

Thanks @Ronak! Was my mistake to include quarter in the list out. Variable quarter belongs only to dataframe df. Only after merge both the list and the dataframe we end up having variable "quarter" in list out. Therefore merging by quarter is not possible in my original task

Ronak Shah Over a year ago

So you can remove it from by in left_join.

vog Over a year ago

For any reason this work in the example I have designed but does not work under my real data

Error: Problem with `summarise()` input `new`. x contrasts can be applied only to factors with 2 or more levels ℹ Input `new` is `list(lm(TE_indiv ~ as.factor(size) - 1) %>% broom::tidy(.))`. ℹ The error occurred in group 5: quarter = NA.

In my real case, size is a character variable. If I group by month all process goes correctly, if I group by quarter the process corrupts under the error below. It happens that character vector quarter doesnt have NA's. Any idea?

Ronak Shah Over a year ago

In that case you have to use tryCatch to catch those errors. See if my updated answer helps in your real data.

Limey · Accepted Answer · 2021-06-14 13:29:46Z

As @RonakShah says, your code fails for an individual element of the list. It's not at all clear what you're trying to achieve, but

out %>% 
  bind_rows(.id="element") %>% 
  left_join(df, by=c("id", "period")) %>% 
  mutate(period=as.factor(period)) %>% 
  group_by(element) %>% 
  group_map(function(.x, .y) lm(pred_dif ~ period-1, data=.x))

at least runs without warning or error and gives possibly sensible output:

[[1]]

Call:
lm(formula = pred_dif ~ period - 1, data = .x)

Coefficients:
period01  period02  period08  period09  period11  period12  
   0.365     0.600     0.620     0.100     0.370     0.412  


[[2]]

Call:
lm(formula = pred_dif ~ period - 1, data = .x)

Coefficients:
period01  period02  period08  period09  period11  period12  
   0.540     0.630     0.270     0.180     0.170     0.232  


[[3]]

Call:
lm(formula = pred_dif ~ period - 1, data = .x)

Coefficients:
period01  period02  period08  period09  period11  period12  
   0.355     0.300     0.525     0.130     0.270     0.552  


[[4]]

Call:
lm(formula = pred_dif ~ period - 1, data = .x)

Coefficients:
period01  period02  period08  period09  period11  period12  
   0.295     0.760     0.705     0.610     0.480     0.618

Thank you @Limey. I think I did not manage to explain the point. The purpose is to explain pred_dif by using the month variable contained in each quarter
I still have no idea what you're trying to achieve. I suggest you provide your expected output and define the process you wish to implement to get to the output, for a single element of the out list. That may give us a chance to implement it. (And, possibly, it may show you how to achieve your desired result yourself.)
I expect to have the same output as follows: lmout_lst <- map(out, ~ left_join(.x, df, by = c('id', 'month')) %>% #group_by(quarter) %>% summarise(new = list(lm(pred_dif ~ as.factor(month) - 1) %>% broom::tidy(.))) %>% unnest(new)) but "estimated" 4 times (one for each quarter) instead of one for each "element"

Collectives™ on Stack Overflow

group_by() in list of data frames

2 Answers 2

4 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Linked

Related