2

I am trying to subset multiple dataframes that are contained in a list based on strings that are contained in another dataframe.

list.df <- list(
 df.1 = data.frame(LM = c(1:10), LS = c(1:10), PL = c(1:10)), 
 df.2 = data.frame(XY = c(1:10), FE = c(4:13), OI = c(1:10)), 
 df.3 = data.frame(IL = c(1:10), KU = c(9:18), TS = c(1:10)))

df.4 <- data.frame(df.1 = c("LM", "PL", NA), df.2 = c("FE", NA, NA), 
 df.3 = c("IL", "KU", "TS"))

I want all my dataframes to look like this in the end:

df.1_sub <- subset(list.df[["df.1"]], select = 
   colnames(list.df[["df.1"]]) %in% df.4$df.1)

I will have to do this for around 50 datasets and was wondering whether there was a way of writing a loop to do this for all the datasets at once.

I have tried using lapply and for loops but was so far unsuccessful. I am new to using lists in R and would appreciate any help! This is my first time posting on stack overflow so please let me know if my post isn't appropriate,

2
  • Just to clarify, if you created df.2_sub it would just be the FE column, correct? And df.3_sub would be a 10x3 dataframe consisting of columns IL, KU, and TS? Commented Jun 5, 2019 at 23:20
  • Yes, that's correct! Commented Jun 5, 2019 at 23:22

2 Answers 2

4

One way using Map would be to remove NA values from df.4 and subset the respective columns from list.df

Map(function(x, y) x[as.character(na.omit(y))], list.df, df.4)

#$df.1
#   LM PL
#1   1  1
#2   2  2
#3   3  3
#4   4  4
#5   5  5
#6   6  6
#7   7  7
#8   8  8
#9   9  9
#10 10 10

#$df.2
#   FE
#1   4
#2   5
#3   6
#4   7
#5   8
#6   9
#7  10
#8  11
#9  12
#10 13

#$df.3
#   IL KU TS
#1   1  9  1
#2   2 10  2
#3   3 11  3
#.....

The same can be achieved using purrr::map2

purrr::map2(list.df, df.4, ~.x[na.omit(as.character(.y))])
Sign up to request clarification or add additional context in comments.

6 Comments

Too fast for me. Map instead of mapply probably makes more sense since you aren't simplifying the result though.
it's a pity df.4 has factor columns or you could collapse significantly - Map(`[`, list.df, lapply(df.4, na.omit)) - which unfortunately gives the wrong answer currently.
Thank you so much for your replies! I have tried the above and it works fine on the example but when I try to do it for my actual data I get this error Error: Can't find columns AD, AB, AW, AC, AL, ... (and 32 more) in .data`. I have check manually, an these columns are definitely in one of the dataframes in the list. Any ideas?
@Ricarda This is not working for all the dataframes in the list. This subsets first column of df.4 with first list in list.df[[1]], second column of df.4 is subsetted with list.df[[2]] and so on. Are you trying to subset it from entire list.df ?
@Ronak, Thanks! I just realised that the order of the columns in df.4 isn't the same as the order of the dfs in list.df. Basically, there is one column in df.4 that corresponds to one of the dfs in the list.That column and the dataframe have the same name.
|
0

We can use complete.cases with Map

Map(function(x, y) x[complete.cases(y)], list.df, df.4)
#$df.1
#   LM LS
#1   1  1
#2   2  2
#3   3  3
#4   4  4
#5   5  5
#6   6  6
#7   7  7
#8   8  8
#9   9  9
#10 10 10

#$df.2
#   XY
#1   1
#2   2
#3   3
#4   4
#5   5
#6   6
#7   7
#8   8
#9   9
#10 10

#$df.3
#   IL KU TS
#1   1  9  1
#2   2 10  2
#3   3 11  3
#4   4 12  4
#5   5 13  5
#6   6 14  6
#7   7 15  7
#8   8 16  8
#9   9 17  9
#10 10 18 10

Or using pmap

library(purrr)  
pmap(list(list.df, df.4), ~ .x[complete.cases(.y)])

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.