1

I have a list of dataframes with some overlapping columns in each. The number of dataframes in the list is unknown. How can I efficiently, in base, rbind the dataframes together and fill in non overlapping columns with zeros?

Example data:

x <- data.frame(a=1:2, b=1:2, c=1:2)
y <- data.frame(a=1:2, r=1:2, f=1:2)
z <- data.frame(b=1:3, c=1:3, v=1:3, t=c("A", "A", "D"))

L1 <- list(x, y, z)

Desired output:

  a b c f r t v
1 1 1 1 0 0 0 0
2 2 2 2 0 0 0 0
3 1 0 0 1 1 0 0
4 2 0 0 2 2 0 0
5 0 1 1 0 0 A 1
6 0 2 2 0 0 A 2
7 0 3 3 0 0 D 3
1
  • 1
    Look at plyr::rbind.fill Commented Nov 6, 2013 at 5:36

2 Answers 2

3

Pad out each data frame with the missing columns, then rbind them:

allnames <- unique(unlist(lapply(L1, names)))
do.call(rbind, lapply(L1, function(df) {
    not <- allnames[!allnames %in% names(df)]
    df[, not] <- 0
    df
}))
Sign up to request clarification or add additional context in comments.

2 Comments

Much nicer than my monster function :-)
@Ananda I would encourage you to leave you function up as it's a learning experience for others. Perhaps a future searcher will have a related problem and your function would fill the bill.
1

I have an old (and probably inefficient) function that does this. I've made one modification here to allow the fill to be specified.

RBIND <- function(datalist, keep.rownames = TRUE, fill = NA) {
  Len <- sapply(datalist, ncol)
  if (all(diff(Len) == 0)) {
    temp <- names(datalist[[1]])
    if (all(sapply(datalist, function(x) names(x) %in% temp))) tryme <- "basic"
    else tryme <- "complex"
  } 
  else tryme <- "complex"
  almost <- switch(
    tryme,
    basic = { do.call("rbind", datalist) },
    complex = {
      Names <- unique(unlist(lapply(datalist, names)))
      NROWS <- c(0, cumsum(sapply(datalist, nrow)))
      NROWS <- paste(NROWS[-length(NROWS)]+1, NROWS[-1], sep=":")
      out <- lapply(1:length(datalist), function(x) {
        emptyMat <- matrix(fill, nrow = nrow(datalist[[x]]), ncol = length(Names))
        colnames(emptyMat) <- Names
        emptyMat[, match(names(datalist[[x]]), 
                         colnames(emptyMat))] <- as.matrix(datalist[[x]])
        emptyMat
      })
      do.call("rbind", out)
    })
  Final <- as.data.frame(almost, row.names = 1:nrow(almost))
  Final <- data.frame(lapply(Final, function(x) type.convert(as.character(x))))
  if (isTRUE(keep.rownames)) {
    row.names(Final) <- make.unique(unlist(lapply(datalist, row.names)))
  } 
  Final
}

Here it is on your sample data.

RBIND(L1, fill = 0)
#     a b c r f v t
# 1   1 1 1 0 0 0 0
# 2   2 2 2 0 0 0 0
# 1.1 1 0 0 1 1 0 0
# 2.1 2 0 0 2 2 0 0
# 1.2 0 1 1 0 0 1 A
# 2.2 0 2 2 0 0 2 A
# 3   0 3 3 0 0 3 D

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.