1

Apologies, if this is a duplicate please let me know, I'll gladly delete.

I am merging two datasets in R using merge.

age1 = c(5, 6, 7, 8, 10, 11) 
fname1 = c('david','alan','ben', 'ben', 'richard', 'edd') 
sname1 = c('albert','raymond','albert','pete','raymond', 'alan')
area1 = c('r','t','n','x','z','w')

df1 <- data.frame(age1, fname1, sname1, area1)

age2 = c(5, 9, 10, 3, 4, 0) 
fname2 = c('david','alan','david', 'ben', 'richard', 'edd') 
sname2 = c('albert','edd','albert','pete','raymond', 'alan')
area2 = c('w','z','x','n','t','r')

df2 = data.frame(age2, fname2, sname2, area2)

Dataset1:

df1
  age1  fname1  sname1 area1
1    5   david  albert     r
2    6    alan raymond     t
3    7     ben  albert     n
4    8     ben    pete     x
5   10 richard raymond     z
6   11     edd    alan     w

Dataset 2

df2
  age2  fname2  sname2 area2
1    5   david  albert     w
2    9    alan     edd     z
3   10   david  albert     x
4    3     ben    pete     n
5    4 richard raymond     t
6    0     edd    alan     r

I merge on fname and sname with:

matchkey <- merge(df1, df2, by.x = c("fname1", "sname1"), by.y = c("fname2", "sname2"))
View(matchkey)

Output:

> matchkey
   fname1  sname1 age1 area1 age2 area2
1     ben    pete    8     x    3     n
2   david  albert    5     r    5     w
3   david  albert    5     r   10     x
4     edd    alan   11     w    0     r
5 richard raymond   10     z    4     t

However, I want to keep the columns I have merged on. How can I do this? Should I use something other than merge?

Expected Output:

   fname1  sname1 age1 area1  fname2   sname 2age2  area2
1     ben    pete    8     x  ben    pete      3       n
2   david  albert    5     r  david  albert    5       w
3   david  albert    5     r  david  albert    10      x
4     edd    alan   11     w  edd    alan      0       r
5 richard raymond   10     z  richard raymond  4       t

I tried looking at, but to no success:

How do I combine two data-frames based on two columns?

Combining two dataframes keeping all columns

Merge two dataframes with repeated columns

Many thanks.

3
  • 1
    It would be easy enough to tack these columns on afterwards. like matchkey[c("fname2", "sname2")] <- matchkey[c("fname1", "sname1")], though it seems redundant to me. Commented Jul 19, 2017 at 15:21
  • 2
    Aren't the columns you want to add by definition exactly the same as fname1 and sname1? A quick solution is to just do: matchkey$fname2 <- matchkey$fname1 and matchkey$sname2 <- matchkey$sname1 Commented Jul 19, 2017 at 15:21
  • Cheers to you both. I was way overthinking this. Thanks for the tips. Commented Jul 19, 2017 at 16:22

1 Answer 1

2

Since merged columns would be exact same on an inner join or complete match between dataframes, simply assign new columns to the columns that remained. And you can do so with transform(). Below adds an outer() combination with paste0 to retrieve needed column order:

matchkey <- transform(merge(df1, df2, by.x = c("fname1", "sname1"), 
                                      by.y = c("fname2", "sname2")),
                      fname2 = fname1, sname2 = sname1)

ordercols <- c(outer(c("fname", "sname", "age", "area"), c(1:2), paste0))
matchkey <- matchkey[ordercols]

matchkey    
#    fname1  sname1 age1 area1  fname2  sname2 age2 area2
# 1     ben    pete    8     x     ben    pete    3     n
# 2   david  albert    5     r   david  albert    5     w
# 3   david  albert    5     r   david  albert   10     x
# 4     edd    alan   11     w     edd    alan    0     r
# 5 richard raymond   10     z richard raymond    4     t
Sign up to request clarification or add additional context in comments.

2 Comments

Cheers @Parfait. Nice way to do it and can be scaled to my larger dataset. Thanks.
Great! Glad to help.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.