Merge a list of dataframes into a single dataframe in R

Question

I have a very large list (just one list) with 13,500 elements in it. Each element is a dataframe with 1 row and 12 columns, each dataframe is structured the same (same columns names and similar data in each column). I want to merge all elements in this list into one dataframe. Essentially, the new dataframe will have 13,500 rows and 12 columns. I need everything in one dataframe to work with ggplot and need to be able to work with data as a dataframe. Can someone suggest the best way to do this? Thanks for the help.

I tried using the purr:: merge() function and was not successful. Or at least the process did not finish in more than 10 min and I had to terminate R studio.

Here are some data from the list:

list(structure(list(n1 = 10, n2 = 10, mean_1 = 0, mean_2 = 0, var_1 = 1, var_2 = 1, tpooled = 2.93152220266846, pvalue_pooled = 0.00891647393074033, result_pooled = 1, t_unpooled = 2.93152220266846, pvalue_unpooled = 0.00931815204271521, result_unpooled = 1), class = "data.frame", row.names = "n1"), structure(list(n1 = 30, n2 = 10, mean_1 = 0, mean_2 = 0, var_1 = 1, var_2 = 1, tpooled = -0.312649684961248, pvalue_pooled = 0.756256229272491, result_pooled = 0, t_unpooled = -0.248766791009062, pvalue_unpooled = 0.808124700588531, result_unpooled = 0), class = "data.frame", row.names = "n2"))

your dput is not completed. merge is a base R function and not from purrr — akrun
– akrun, Commented Apr 27, 2022 at 16:03
"was not successful" .. how so? We don't have complete data, code attempted, actual output, or what's wrong with it .This makes it a little difficult to help you well. — r2evans
– r2evans, Commented Apr 27, 2022 at 16:08
Use rbindlist() from the data.table package link -- this will "stack" all 13,500 list elements as one data.frame — DanY
– DanY, Commented Apr 27, 2022 at 16:10
FYI, in questions, code fences (triple backticks ```) need to be alone on their own lines, not shared with any code or text. See stackoverflow.com/editing-help and meta.stackexchange.com/a/22189 — r2evans
– r2evans, Commented Apr 27, 2022 at 16:10

AndrewGB · Accepted Answer · 2024-09-04 12:33:11Z

You can use bind_rows from dplyr, which will create one dataframe from the list of dataframes, and is a fairly efficient option.

library(dplyr)

bind_rows(ll)

Results

       n1 n2 mean_1 mean_2 var_1 var_2    tpooled pvalue_pooled result_pooled t_unpooled pvalue_unpooled result_unpooled
n1...1 10 10      0      0     1     1  2.9315222   0.008916474             1  2.9315222     0.009318152               1
n1...2 30 10      0      0     1     1 -0.3126497   0.756256229             0 -0.2487668     0.808124701               0

However, as @nicola mentioned, rbindlist from data.table will likely be the fastest option.

data.table::rbindlist(ll)

Then, you can always turn the data.table back into a dataframe, if you do not want to work with a data.table:

data.table::rbindlist(ll) %>% 
  as.data.frame()

jpsmith · Accepted Answer · 2022-04-27 17:18:46Z

Your dput() code was not completed, so I am creating an example list based on how you described it:

ll <- vector(mode = "list", length = 100)
for (i in 1:length(ll)){
  ll[[i]] <- data.frame(matrix(runif(12), nrow = 1))
}

Which is a list of length 100, each position containing a data frame of 1 row and 12 columns of a random number. To make it into one large data frame (100 rows and 12 columns), try:

ll_df <- do.call(rbind, ll)

Output:

# > ll_df
#     X1          X2         X3          X4         X5          X6         X7         X8          X9         X10        X11         X12
# 1  0.231912927 0.270163433 0.82299350 0.025836254 0.40592551 0.596034614 0.52873965 0.68257091 0.507812908 0.554371795 0.84124010 0.312510160
# 2  0.035948120 0.815994061 0.77857679 0.859379491 0.06571936 0.008806119 0.59168088 0.86961538 0.446291886 0.037575005 0.41029058 0.365216211
# 3  0.476584831 0.133677756 0.47945626 0.264312692 0.48993294 0.906061205 0.50099734 0.70350681 0.057910028 0.689310918 0.79879528 0.018855033
# 4  0.036814572 0.577822232 0.79003586 0.735261033 0.26853772 0.805366424 0.42493288 0.16521519 0.604047569 0.825760356 0.78095093 0.081476899
# 5  0.070758368 0.958960018 0.09029276 0.212251252 0.43920359 0.777871489 0.85140796 0.62472390 0.388040910 0.143754851 0.88167280 0.873741813
# 6  0.338623692 0.513312964 0.49393542 0.793437806 0.91841512 0.586360269 0.82348039 0.80743891 0.281572984 0.508648599 0.29522944 0.867623769
#...
# continues

Upvoted. I'd mention also data.table::rbindlist as an alternative.
Thanks for the help, your code works. The dput() was really big and thats the reason i just copied the first few lines.
Great - Glad it helped! In the future, there are several ways to export a truncated dput() - for instance, to just output the first 10 positions of your list data, use dput(ll[1:10])
@jpsmith Although this is a couple of years old, I corrected the dput so that it works now.

Collectives™ on Stack Overflow

Merge a list of dataframes into a single dataframe in R

2 Answers 2

Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Related