Replace values in a dataframe with values from another dataframe when common value is found in specific column

Question

I am trying to replace hours in df with hours from replacements for project IDs that exist in both dataframes:

import pandas as pd

df = pd.DataFrame({
    'project_ids': [1, 2, 3, 4, 5],
    'hours': [111, 222, 333, 444, 555],
    'else' :['a', 'b', 'c', 'd', 'e']
})

replacements = pd.DataFrame({
    'project_ids': [2, 5, 3],
    'hours': [666, 999, 1000],
})

for project in replacements['project_ids']:
    df.loc[df['project_ids'] == project, 'hours'] = replacements.loc[replacements['project_ids'] == project, 'hours']

print(df)

However, only the project ID 3 gets correct assignment (1000), but both 2 and 5 get NaN:

 projects   hours else
0         1   111.0    a
1         2     NaN    b
2         3  1000.0    c
3         4   444.0    d
4         5     NaN    e

How can I fix it?
Is there a better way to do this?

jezrael · Accepted Answer · 2019-04-29 10:29:52Z

1

Use Series.map with another Series created by replacements with DataFrame.set_index:

s = replacements.set_index('project_ids')['hours']
df['hours'] = df['project_ids'].map(s).fillna(df['hours'])
print(df)
   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

answered Apr 29, 2019 at 10:29

jezrael

868k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

barciewicz Over a year ago

Thanks! Am I correct that df['project_ids'].map(s) is looking for each project ID from df in the index of s, and returns hours value for this index when found?

jezrael Over a year ago

@barciewicz - exactly. And if no match, there are NaNs, which are replace by original by .fillna(df['hours'])

anky · Accepted Answer · 2019-04-29 10:30:22Z

1

Another way using df.update():

m=df.set_index('project_ids')
m.update(replacements.set_index('project_ids')['hours'])
print(m.reset_index())

   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

answered Apr 29, 2019 at 10:30

anky

75.3k11 gold badges46 silver badges76 bronze badges

Comments

Erfan · Accepted Answer · 2019-04-29 10:34:23Z

0

Another solution would be to use pandas.merge and then use fillna:

df_new = pd.merge(df, replacements, on='project_ids', how='left', suffixes=['_1', ''])
df_new['hours'].fillna(df_new['hours_1'], inplace=True)
df_new.drop('hours_1', axis=1, inplace=True)

print(df_new)
   project_ids else   hours
0            1    a   111.0
1            2    b   666.0
2            3    c  1000.0
3            4    d   444.0
4            5    e   999.0

answered Apr 29, 2019 at 10:34

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Collectives™ on Stack Overflow

Replace values in a dataframe with values from another dataframe when common value is found in specific column

3 Answers 3

2 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Related