4

I have a very large list (just one list) with 13,500 elements in it. Each element is a dataframe with 1 row and 12 columns, each dataframe is structured the same (same columns names and similar data in each column). I want to merge all elements in this list into one dataframe. Essentially, the new dataframe will have 13,500 rows and 12 columns. I need everything in one dataframe to work with ggplot and need to be able to work with data as a dataframe. Can someone suggest the best way to do this? Thanks for the help.

I tried using the purr:: merge() function and was not successful. Or at least the process did not finish in more than 10 min and I had to terminate R studio.

Here are some data from the list:

list(structure(list(n1 = 10, n2 = 10, mean_1 = 0, mean_2 = 0, var_1 = 1, var_2 = 1, tpooled = 2.93152220266846, pvalue_pooled = 0.00891647393074033, result_pooled = 1, t_unpooled = 2.93152220266846, pvalue_unpooled = 0.00931815204271521, result_unpooled = 1), class = "data.frame", row.names = "n1"), structure(list(n1 = 30, n2 = 10, mean_1 = 0, mean_2 = 0, var_1 = 1, var_2 = 1, tpooled = -0.312649684961248, pvalue_pooled = 0.756256229272491, result_pooled = 0, t_unpooled = -0.248766791009062, pvalue_unpooled = 0.808124700588531, result_unpooled = 0), class = "data.frame", row.names = "n2"))
4
  • 2
    your dput is not completed. merge is a base R function and not from purrr Commented Apr 27, 2022 at 16:03
  • 2
    "was not successful" .. how so? We don't have complete data, code attempted, actual output, or what's wrong with it .This makes it a little difficult to help you well. Commented Apr 27, 2022 at 16:08
  • Use rbindlist() from the data.table package link -- this will "stack" all 13,500 list elements as one data.frame Commented Apr 27, 2022 at 16:10
  • 1
    FYI, in questions, code fences (triple backticks ```) need to be alone on their own lines, not shared with any code or text. See stackoverflow.com/editing-help and meta.stackexchange.com/a/22189 Commented Apr 27, 2022 at 16:10

2 Answers 2

5

You can use bind_rows from dplyr, which will create one dataframe from the list of dataframes, and is a fairly efficient option.

library(dplyr)

bind_rows(ll)

Results

       n1 n2 mean_1 mean_2 var_1 var_2    tpooled pvalue_pooled result_pooled t_unpooled pvalue_unpooled result_unpooled
n1...1 10 10      0      0     1     1  2.9315222   0.008916474             1  2.9315222     0.009318152               1
n1...2 30 10      0      0     1     1 -0.3126497   0.756256229             0 -0.2487668     0.808124701               0

However, as @nicola mentioned, rbindlist from data.table will likely be the fastest option.

data.table::rbindlist(ll)

Then, you can always turn the data.table back into a dataframe, if you do not want to work with a data.table:

data.table::rbindlist(ll) %>% 
  as.data.frame()
Sign up to request clarification or add additional context in comments.

Comments

3

Your dput() code was not completed, so I am creating an example list based on how you described it:

ll <- vector(mode = "list", length = 100)
for (i in 1:length(ll)){
  ll[[i]] <- data.frame(matrix(runif(12), nrow = 1))
}

Which is a list of length 100, each position containing a data frame of 1 row and 12 columns of a random number. To make it into one large data frame (100 rows and 12 columns), try:

ll_df <- do.call(rbind, ll)

Output:

# > ll_df
#     X1          X2         X3          X4         X5          X6         X7         X8          X9         X10        X11         X12
# 1  0.231912927 0.270163433 0.82299350 0.025836254 0.40592551 0.596034614 0.52873965 0.68257091 0.507812908 0.554371795 0.84124010 0.312510160
# 2  0.035948120 0.815994061 0.77857679 0.859379491 0.06571936 0.008806119 0.59168088 0.86961538 0.446291886 0.037575005 0.41029058 0.365216211
# 3  0.476584831 0.133677756 0.47945626 0.264312692 0.48993294 0.906061205 0.50099734 0.70350681 0.057910028 0.689310918 0.79879528 0.018855033
# 4  0.036814572 0.577822232 0.79003586 0.735261033 0.26853772 0.805366424 0.42493288 0.16521519 0.604047569 0.825760356 0.78095093 0.081476899
# 5  0.070758368 0.958960018 0.09029276 0.212251252 0.43920359 0.777871489 0.85140796 0.62472390 0.388040910 0.143754851 0.88167280 0.873741813
# 6  0.338623692 0.513312964 0.49393542 0.793437806 0.91841512 0.586360269 0.82348039 0.80743891 0.281572984 0.508648599 0.29522944 0.867623769
#...
# continues

4 Comments

Upvoted. I'd mention also data.table::rbindlist as an alternative.
Thanks for the help, your code works. The dput() was really big and thats the reason i just copied the first few lines.
Great - Glad it helped! In the future, there are several ways to export a truncated dput() - for instance, to just output the first 10 positions of your list data, use dput(ll[1:10])
@jpsmith Although this is a couple of years old, I corrected the dput so that it works now.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.