0

I am trying to replace the values of rows in one dataframe with another.

The following is the sample code

import pandas as pd
import numpy as np
from pprint import pprint

raceA = ['r1','r3','r4','r5','r6','r7','r8', 'r9']
qualifierA = ['last','first','first','first','last','last','first','first']
participantA = ['rat','rat','cat','cat','rat','dog','dog','dog']
dfA = pd.DataFrame(
    {'race':raceA,
     'qualifier':qualifierA,
     'participant':participantA

    }
)
pprint(dfA)

raceB = ['r1','r2','r3','r4','r5','r6','r7','r8', 'r9','r10']
qualifierB = ['last',np.nan,np.nan,'first','first','last','last','first','first',np.nan]
participantB = ['rat','rat',np.nan,'cat','cat','rat','dog','dog',np.nan,np.nan]
dfB = pd.DataFrame(
    {'race':raceB,
     'qualifier':qualifierB,
     'participant':participantB

    }
)
pprint(dfB)
dfB.loc[dfB.race.isin(dfA.race), ['qualifier','participant']] = dfA[['qualifier','participant']]
pprint(dfB)

For instance in dfA,

r9     first         dog

dfB contains,

 r9     first         NaN

Desired output: dfB

r9     first         dog

Output obtained:

r9       NaN         NaN

Could someone look into this?

3 Answers 3

2

Use DataFrame.fillna with dataframe as:

df = dfB.set_index('race').fillna(dfA.set_index('race')).reset_index()

print(df)
  race qualifier participant
0   r1      last         rat
1   r2       NaN         rat
2   r3     first         rat
3   r4     first         cat
4   r5     first         cat
5   r6      last         rat
6   r7      last         dog
7   r8     first         dog
8   r9     first         dog
9  r10       NaN         NaN

Or using update:

dfB = dfB.set_index('race')
dfA = dfA.set_index('race')

dfB.update(dfA)

print(dfB.reset_index())
 race qualifier participant
0   r1      last         rat
1   r2       NaN         rat
2   r3     first         rat
3   r4     first         cat
4   r5     first         cat
5   r6      last         rat
6   r7      last         dog
7   r8     first         dog
8   r9     first         dog
9  r10       NaN         NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Just wanted to add a suggestion, can also use df.update() instead of df.fillna() if the replacement should be unconditional. Also I think the OP is interested in the second code snippet, perhaps it can be brought to the top.
@crazyGamer added but it is little tricky, since update inplace the original dataframe directly.
1

I would do something like this in multiple steps.

First I will merge the two dataframes -

dfB_PreProcessing = dfB.merge(dfA,left_on='race',right_on='race',how="left")

enter image description here Then Clean the participant column -

dfB_PreProcessing['participant_x'] = dfB_PreProcessing['participant_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['participant'] = np.where(dfB_PreProcessing['participant_x'] == '', dfB_PreProcessing['participant_y'], dfB_PreProcessing['participant_x'])

Then clean the qualifier column (if needed to) -

dfB_PreProcessing['qualifier_x'] = dfB_PreProcessing['qualifier_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['qualifier'] = np.where(dfB_PreProcessing['qualifier_x'] == '', dfB_PreProcessing['qualifier_y'], dfB_PreProcessing['qualifier_x'])*

Then select only the required columns as output df-

dfB = dfB_PreProcessing.loc[:,['race','qualifier','participant']]

enter image description here

Let me know, if it works or it doesn't.

1 Comment

Consider a left merge instead of inner merge, that is why r2 and r10 are missing in the first table printed.
0

Correct me If I am not getting It properly. If you want to update a row of one or multiple columns then you can update the value of that particular index of that column. eg. If I want to update all rows in B column then

df = pd.DataFrame({'A':[1,2,3],'B': [4,5,6]})
df1 = pd.DataFrame({'B':[7,8,9]})
df.update(df1)
pprint(df)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.