2

I have two long lists of large dataframes that are equal in length. I want to merge Dataframe1 (from list1) with Dataframe1 (from list2) and Dataframe2 (from list1) with Dataframe2 (from list2) etc...

Below is a minimal reproducible example and some attempts.

#### EXAMPLE
#Create Dataframes
df_1 <- data.frame(c("Bah",NA,2,3,4),c("Bug",NA,5,6,NA))
df_2 <- data.frame(c("Blu",7,8,9,10),c(NA,NA,NA,12,13))
df_3 <- data.frame(c("Bah",NA,21,32,43),c("Rgh",NA,51,63,NA))
df_4 <- data.frame(c("Gar",7,8,9,10),c("Ghh",NA,NA,121,131))

#Create Lists
list1 <- list(df_1,df_2)
list2 <- list(df_3,df_4)

#Set column and row names for each dataframe
colnames(list1[[1]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")
colnames(list1[[2]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")
colnames(list2[[1]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")
colnames(list2[[2]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")

rownames(list1[[1]]) <-  c("1","2")
rownames(list1[[2]]) <-  c("1","2")
rownames(list2[[1]]) <-  c("1","2")
rownames(list2[[2]]) <-  c("1","2")

My desired output is a list of the same length as the input lists but with each dataframe merged by position into a single dataframe. The following yields my desired output for the dataframes and list but is low throughput.

#### DESIRED OUTPUT
DesiredOutput_DF1_Format <- merge(list1[[1]],list2[[1]], all = TRUE, by = "SampleID")
DesiredOutput_DF2_Format <- merge(list1[[2]],list2[[2]], all = TRUE, by = "SampleID")
DesiredOutput_List <- list(DesiredOutput_DF1_Format, DesiredOutput_DF2_Format)

How can I generate an output list in my desired format in a highthroughput way using an apply-like approach?

#### ATTEMPTS
#Attempt1:
attempt1 <- mapply(cbind, list1, list2, simplify=FALSE)

#Attempt2: 
My instinct is to use `lapply` but i cant figure how to make it iterate through two lists simultaneously.

#Attempt3: Works but the order of the output list appears inverted. This is not intuitive, though it is easily corrected... There has to be a cleaner way.
output_list <- list()
dataset_iterator <- 1:length(list1)

for (x in dataset_iterator) {
    df1 <- data.frame(list1[[x]])
    df2 <- data.frame(list2[[x]])
    df_merged <- data.frame(merge(df1, df2, by = "Barcodes", all=TRUE))
    output_list <- append(output_list, list(df_merged), 0)
0

1 Answer 1

1

Based on the code showed, we may need Map (or mapply with SIMPLIFY = FALSE)

out <- Map(merge, list1, list2, MoreArgs = list(all = TRUE, by = "SampleID"))

-checking with expected output

> identical(DesiredOutput_List, out)
[1] TRUE

Or using tidyverse

library(purrr)
library(dplyr)
map2(list1, list2, full_join, by = "SampleID")
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.