Replace rows in one dataframe with another

Question

I am trying to replace the values of rows in one dataframe with another.

The following is the sample code

import pandas as pd
import numpy as np
from pprint import pprint

raceA = ['r1','r3','r4','r5','r6','r7','r8', 'r9']
qualifierA = ['last','first','first','first','last','last','first','first']
participantA = ['rat','rat','cat','cat','rat','dog','dog','dog']
dfA = pd.DataFrame(
    {'race':raceA,
     'qualifier':qualifierA,
     'participant':participantA

    }
)
pprint(dfA)

raceB = ['r1','r2','r3','r4','r5','r6','r7','r8', 'r9','r10']
qualifierB = ['last',np.nan,np.nan,'first','first','last','last','first','first',np.nan]
participantB = ['rat','rat',np.nan,'cat','cat','rat','dog','dog',np.nan,np.nan]
dfB = pd.DataFrame(
    {'race':raceB,
     'qualifier':qualifierB,
     'participant':participantB

    }
)
pprint(dfB)
dfB.loc[dfB.race.isin(dfA.race), ['qualifier','participant']] = dfA[['qualifier','participant']]
pprint(dfB)

For instance in dfA,

r9     first         dog

dfB contains,

 r9     first         NaN

Desired output: dfB

r9     first         dog

Output obtained:

r9       NaN         NaN

Could someone look into this?

Space Impact · Accepted Answer · 2019-06-28 11:22:50Z

2

Use DataFrame.fillna with dataframe as:

df = dfB.set_index('race').fillna(dfA.set_index('race')).reset_index()

print(df)
  race qualifier participant
0   r1      last         rat
1   r2       NaN         rat
2   r3     first         rat
3   r4     first         cat
4   r5     first         cat
5   r6      last         rat
6   r7      last         dog
7   r8     first         dog
8   r9     first         dog
9  r10       NaN         NaN

Or using update:

dfB = dfB.set_index('race')
dfA = dfA.set_index('race')

dfB.update(dfA)

print(dfB.reset_index())
 race qualifier participant
0   r1      last         rat
1   r2       NaN         rat
2   r3     first         rat
3   r4     first         cat
4   r5     first         cat
5   r6      last         rat
6   r7      last         dog
7   r8     first         dog
8   r9     first         dog
9  r10       NaN         NaN

edited Jun 28, 2019 at 11:22

answered Jun 28, 2019 at 10:58

Space Impact

13.3k26 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

crazyGamer Over a year ago

Just wanted to add a suggestion, can also use df.update() instead of df.fillna() if the replacement should be unconditional. Also I think the OP is interested in the second code snippet, perhaps it can be brought to the top.

Space Impact Over a year ago

@crazyGamer added but it is little tricky, since update inplace the original dataframe directly.

Student of the Digital World · Accepted Answer · 2019-06-28 11:22:02Z

I would do something like this in multiple steps.

First I will merge the two dataframes -

dfB_PreProcessing = dfB.merge(dfA,left_on='race',right_on='race',how="left")

Then Clean the participant column -

dfB_PreProcessing['participant_x'] = dfB_PreProcessing['participant_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['participant'] = np.where(dfB_PreProcessing['participant_x'] == '', dfB_PreProcessing['participant_y'], dfB_PreProcessing['participant_x'])

Then clean the qualifier column (if needed to) -

dfB_PreProcessing['qualifier_x'] = dfB_PreProcessing['qualifier_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['qualifier'] = np.where(dfB_PreProcessing['qualifier_x'] == '', dfB_PreProcessing['qualifier_y'], dfB_PreProcessing['qualifier_x'])*

Then select only the required columns as output df-

dfB = dfB_PreProcessing.loc[:,['race','qualifier','participant']]

Let me know, if it works or it doesn't.

Consider a left merge instead of inner merge, that is why r2 and r10 are missing in the first table printed.

Akash Kumar · Accepted Answer · 2019-06-28 11:20:41Z

0

Correct me If I am not getting It properly. If you want to update a row of one or multiple columns then you can update the value of that particular index of that column. eg. If I want to update all rows in B column then

df = pd.DataFrame({'A':[1,2,3],'B': [4,5,6]})
df1 = pd.DataFrame({'B':[7,8,9]})
df.update(df1)
pprint(df)

answered Jun 28, 2019 at 11:20

Akash Kumar

3111 gold badge3 silver badges13 bronze badges

Collectives™ on Stack Overflow

Replace rows in one dataframe with another

3 Answers 3

2 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Linked

Related