Combining two dataframes keeping all columns [duplicate]

Question

What I would like to do is combine 2 dataframes, keeping all columns (which is not done in the example below) and input zeros where there are gaps in the dataframe from uncommon variables.

This seems like a plyr or dplyr theme. However, a full join in plyr does not keep all of the columns, whilst a left or a right join does not keep all the rows I desire. Looking at the dplyr cheatsheet (http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf), a full_join seems to be the function I need, but R does not recognise this function after succesfully loading the package.

As an example:

col1 <- c("ab","bc","cd","de")
col2 <- c(1,2,3,4)
df1 <- as.data.frame(cbind(col1,col2))
col1 <- c("ab","ef","fg","gh")
col3 <- c(5,6,7,8)
df2 <- as.data.frame(cbind(col1,col3))
library(plyr)
Example <- join(df1,df2,by = "col1", type = "full") #Does not keep col3
library(dplyr)
Example <- full_join(df1,df2,by = "col1") #Function not recognised

I would like the output...

col1 col2 col3
ab    1    5
bc    2    0
cd    3    0
de    4    0
ef    0    6
fg    0    7
gh    0    8

full_join works fine for me. As well as merge(df1, df2, by = "col1", all = TRUE). Though your desired output is strange — David Arenburg
– David Arenburg, Commented Jun 24, 2015 at 11:16
I think that line 6 of your code should read df2 <- as.data.frame(cbind(col1,col3)). Then Example <- join(df1,df2,by = "col1", type = "full")works fine, you may just need to replace the NAs with 0s. — RHertel
– RHertel, Commented Jun 24, 2015 at 11:20
akrun I have now edited the code. This was a simplified version of my actual data and after the edit my predicament was the same. David perhaps I have an older version, in any case your merge solution worked perfectly thank you! — James White
– James White, Commented Jun 24, 2015 at 11:22

RHertel · Accepted Answer · 2015-06-24 11:46:22Z

4

The solutions

Example <- merge(df1, df2, by = "col1", all = TRUE)`

and

Example <- join(df1,df2,by = "col1", type = "full")

give the same result, both with a number of NA's:

#> Example
#  col1 col2 col3
#1   ab    1    5
#2   bc    2 <NA>
#3   cd    3 <NA>
#4   de    4 <NA>
#5   ef <NA>    6
#6   fg <NA>    7
#7   gh <NA>    8

One possibility to replace those entries with zeros is to convert the data frame into a matrix, change the entries, and convert back to a data frame:

Example <- as.matrix(Example)
Example[is.na(Example)] <- 0
Example <- as.data.frame(Example)
#> Example
#  col1 col2 col3
#1   ab    1    5
#2   bc    2    0
#3   cd    3    0
#4   de    4    0
#5   ef    0    6
#6   fg    0    7
#7   gh    0    8

PS: I'm almost certain that @akrun knows another way to achieve this in a single line ;)

answered Jun 24, 2015 at 11:46

RHertel

23.8k5 gold badges42 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

akrun Over a year ago

As the OP created 'factor' columns by as.data.frame(cbind, one possible option is library(car); Example[] <- lapply(Example, recode, 'NA=0')

David Arenburg Over a year ago

Not sure what did you add to already commented/posted merge solution and to the full_join mentioned by the OP which also works. Using plyr instead of dplyr isn't an improvement.

RHertel Over a year ago

It was just a minor change, replacing the NAs with zeros, according to the OP's requested output.

James White Over a year ago

Thank you for this answer. Yes the plyr option does work on this small example, but not on my actual dataset for some reason, I am not sure why as yet. The merge option worked perfectly though.

David Arenburg Over a year ago

@James The merge function is already in your answer though. Also, update you dplyr version and full_join should also work.

|

James White · Accepted Answer · 2015-06-24 11:24:23Z

1

Following David Arenberg's comment above...

Example <- merge(df1, df2, by = "col1", all = TRUE)

answered Jun 24, 2015 at 11:24

James White

8153 gold badges8 silver badges21 bronze badges

Collectives™ on Stack Overflow

Combining two dataframes keeping all columns [duplicate]

2 Answers 2

6 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Linked

Related