3

What I would like to do is combine 2 dataframes, keeping all columns (which is not done in the example below) and input zeros where there are gaps in the dataframe from uncommon variables.

This seems like a plyr or dplyr theme. However, a full join in plyr does not keep all of the columns, whilst a left or a right join does not keep all the rows I desire. Looking at the dplyr cheatsheet (http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf), a full_join seems to be the function I need, but R does not recognise this function after succesfully loading the package.

As an example:

col1 <- c("ab","bc","cd","de")
col2 <- c(1,2,3,4)
df1 <- as.data.frame(cbind(col1,col2))
col1 <- c("ab","ef","fg","gh")
col3 <- c(5,6,7,8)
df2 <- as.data.frame(cbind(col1,col3))
library(plyr)
Example <- join(df1,df2,by = "col1", type = "full") #Does not keep col3
library(dplyr)
Example <- full_join(df1,df2,by = "col1") #Function not recognised

I would like the output...

col1 col2 col3
ab    1    5
bc    2    0
cd    3    0
de    4    0
ef    0    6
fg    0    7
gh    0    8
3
  • full_join works fine for me. As well as merge(df1, df2, by = "col1", all = TRUE). Though your desired output is strange Commented Jun 24, 2015 at 11:16
  • I think that line 6 of your code should read df2 <- as.data.frame(cbind(col1,col3)). Then Example <- join(df1,df2,by = "col1", type = "full")works fine, you may just need to replace the NAs with 0s. Commented Jun 24, 2015 at 11:20
  • akrun I have now edited the code. This was a simplified version of my actual data and after the edit my predicament was the same. David perhaps I have an older version, in any case your merge solution worked perfectly thank you! Commented Jun 24, 2015 at 11:22

2 Answers 2

4

The solutions

Example <- merge(df1, df2, by = "col1", all = TRUE)` 

and

Example <- join(df1,df2,by = "col1", type = "full")

give the same result, both with a number of NA's:

#> Example
#  col1 col2 col3
#1   ab    1    5
#2   bc    2 <NA>
#3   cd    3 <NA>
#4   de    4 <NA>
#5   ef <NA>    6
#6   fg <NA>    7
#7   gh <NA>    8

One possibility to replace those entries with zeros is to convert the data frame into a matrix, change the entries, and convert back to a data frame:

Example <- as.matrix(Example)
Example[is.na(Example)] <- 0
Example <- as.data.frame(Example)
#> Example
#  col1 col2 col3
#1   ab    1    5
#2   bc    2    0
#3   cd    3    0
#4   de    4    0
#5   ef    0    6
#6   fg    0    7
#7   gh    0    8

PS: I'm almost certain that @akrun knows another way to achieve this in a single line ;)

Sign up to request clarification or add additional context in comments.

6 Comments

As the OP created 'factor' columns by as.data.frame(cbind, one possible option is library(car); Example[] <- lapply(Example, recode, 'NA=0')
Not sure what did you add to already commented/posted merge solution and to the full_join mentioned by the OP which also works. Using plyr instead of dplyr isn't an improvement.
It was just a minor change, replacing the NAs with zeros, according to the OP's requested output.
Thank you for this answer. Yes the plyr option does work on this small example, but not on my actual dataset for some reason, I am not sure why as yet. The merge option worked perfectly though.
@James The merge function is already in your answer though. Also, update you dplyr version and full_join should also work.
|
1

Following David Arenberg's comment above...

Example <- merge(df1, df2, by = "col1", all = TRUE)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.