How to update a dataframe in Pandas Python

Question

I have the following two dataframes in pandas:

DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         0
A1         A3         0
A1         A4         0
A2         A3         0

DF2:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A2         A3         6
A6         A7         9

I would like (without looping and comparing) to find the matching AuthorID1 and AuthorID2 pairing in DF2 that exist in DF1 and update the column values accordingly. So the result for the above two tables would be the following:

Resulting Updated DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A1         A3         0
A1         A4         0
A2         A3         6

Is there a fast way to do this? As I have 7 millions rows in DF1 and looping and comparing would just take forever.

Update: note that the last two in DF2 should not be part of the update in DF1 since it doesn't exist in DF1

jezrael · Accepted Answer · 2016-08-08 13:50:38Z

You can use update:

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0

Sample:

df1 = pd.DataFrame({'new': {0: 7, 1: 8, 2: 1, 3: 3}, 
                    'AuthorID2': {0: 'A2', 1: 'A3', 2: 'A4', 3: 'A3'}, 
                    'AuthorID1': {0: 'A1', 1: 'A1', 2: 'A1', 3: 'A2'}, 
                    'Co-Authored': {0: 0, 1: 0, 2: 0, 3: 0}})

df2 = pd.DataFrame({'AuthorID2': {0: 'A2', 1: 'A3'},
                    'AuthorID1': {0: 'A1', 1: 'A2'}, 
                    'Co-Authored': {0: 5, 1: 6}})

  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2            0    7
1        A1        A3            0    8
2        A1        A4            0    1
3        A2        A3            0    3

print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2          5.0    7
1        A2        A3          6.0    8
2        A1        A4          0.0    1
3        A2        A3          0.0    3

EDIT by comment:

I think you need filter df2 by df1 firstly with isin:

df2 = df2[df2[['AuthorID1','AuthorID2']].isin(df1[['AuthorID1','AuthorID2']]).any(1)]
print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0

would this still work if i have more columns in DF1, but only want to update the "Co-Authored" column in DF1 based on the updated values in DF2?
For me it works with adding new column to DF1, give me a time, I add sample.
I'm having another problem. What if DF2 has values for AuthorID1 and AuthorID2 that aren't in DF1? In this case it should just ignore them and not update it to DF1. How do I specify the criteria in which to update? I'll edit the question accordingly, it seems 'update' doesn't work in this case

Mad Physicist · Accepted Answer · 2017-05-04 15:05:07Z

0

you can use the parameters as below：

filter_func : callable(1d-array) -> 1d-array<boolean>, default None

Can choose to replace values other than NA. Return True for values that should be updated

edited May 4, 2017 at 15:05

Mad Physicist

116k29 gold badges201 silver badges291 bronze badges

answered May 4, 2017 at 14:45

wuxiliang3322335

11 bronze badge

Collectives™ on Stack Overflow

How to update a dataframe in Pandas Python

2 Answers 2

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Linked

Related