2

I have the following two dataframes in pandas:

DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         0
A1         A3         0
A1         A4         0
A2         A3         0

DF2:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A2         A3         6
A6         A7         9

I would like (without looping and comparing) to find the matching AuthorID1 and AuthorID2 pairing in DF2 that exist in DF1 and update the column values accordingly. So the result for the above two tables would be the following:

Resulting Updated DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A1         A3         0
A1         A4         0
A2         A3         6

Is there a fast way to do this? As I have 7 millions rows in DF1 and looping and comparing would just take forever.

Update: note that the last two in DF2 should not be part of the update in DF1 since it doesn't exist in DF1

2 Answers 2

2

You can use update:

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0

Sample:

df1 = pd.DataFrame({'new': {0: 7, 1: 8, 2: 1, 3: 3}, 
                    'AuthorID2': {0: 'A2', 1: 'A3', 2: 'A4', 3: 'A3'}, 
                    'AuthorID1': {0: 'A1', 1: 'A1', 2: 'A1', 3: 'A2'}, 
                    'Co-Authored': {0: 0, 1: 0, 2: 0, 3: 0}})

df2 = pd.DataFrame({'AuthorID2': {0: 'A2', 1: 'A3'},
                    'AuthorID1': {0: 'A1', 1: 'A2'}, 
                    'Co-Authored': {0: 5, 1: 6}})

  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2            0    7
1        A1        A3            0    8
2        A1        A4            0    1
3        A2        A3            0    3

print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2          5.0    7
1        A2        A3          6.0    8
2        A1        A4          0.0    1
3        A2        A3          0.0    3

EDIT by comment:

I think you need filter df2 by df1 firstly with isin:

df2 = df2[df2[['AuthorID1','AuthorID2']].isin(df1[['AuthorID1','AuthorID2']]).any(1)]
print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0
Sign up to request clarification or add additional context in comments.

5 Comments

would this still work if i have more columns in DF1, but only want to update the "Co-Authored" column in DF1 based on the updated values in DF2?
For me it works with adding new column to DF1, give me a time, I add sample.
I'm having another problem. What if DF2 has values for AuthorID1 and AuthorID2 that aren't in DF1? In this case it should just ignore them and not update it to DF1. How do I specify the criteria in which to update? I'll edit the question accordingly, it seems 'update' doesn't work in this case
It looks like more complicated, give me a sec.
I add solution, please check it.
0

you can use the parameters as below:

filter_func : callable(1d-array) -> 1d-array<boolean>, default None

Can choose to replace values other than NA. Return True for values that should be updated

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.