6

Ok this seems like it should be easy to do with merge or concatenate operations but I can't crack it. I'm working in pandas.

I have two dataframes with duplicate rows in between them and I want to combine them in a manner where no rows or columns are duplicated. It would work like this

df1:

A B 
a 1
b 2
c 3

df2:

A B 
b 2
c 3
d 4

df3 = df1 combined with df2

A B 
a 1
b 2
c 3
d 4

Some methods I've tried are to select the rows that are in one but not the other (an XOR) and then append them, but I can't figure out how to do the selection. The other idea I have is to append them and them delete duplicate rows, but I don't know how to do the latter.

0

2 Answers 2

6

You want an outer merge:

In [103]:
df1.merge(df2, how='outer')

Out[103]:
   A  B
0  a  1
1  b  2
2  c  3
3  d  4

The above works as it naturally finds common columns between both dfs and specifying the merge type results in a df with a union of the combined columns as desired.

Sign up to request clarification or add additional context in comments.

2 Comments

What if you have some rows that are duplicates and some that aren't and based on index you want to keep instances in df1, and drop repeat indexes in df2 (should this be a new question)
what if you want to merge such that values from df2 overwrite same values from df1
2

You can use the following to drop the duplicates:

pd.concat([df1, df2]).drop_duplicates() 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.