0

I am trying to replace hours in df with hours from replacements for project IDs that exist in both dataframes:

import pandas as pd

df = pd.DataFrame({
    'project_ids': [1, 2, 3, 4, 5],
    'hours': [111, 222, 333, 444, 555],
    'else' :['a', 'b', 'c', 'd', 'e']
})

replacements = pd.DataFrame({
    'project_ids': [2, 5, 3],
    'hours': [666, 999, 1000],
})

for project in replacements['project_ids']:
    df.loc[df['project_ids'] == project, 'hours'] = replacements.loc[replacements['project_ids'] == project, 'hours']

print(df)

However, only the project ID 3 gets correct assignment (1000), but both 2 and 5 get NaN:

 projects   hours else
0         1   111.0    a
1         2     NaN    b
2         3  1000.0    c
3         4   444.0    d
4         5     NaN    e
  1. How can I fix it?
  2. Is there a better way to do this?

3 Answers 3

1

Use Series.map with another Series created by replacements with DataFrame.set_index:

s = replacements.set_index('project_ids')['hours']
df['hours'] = df['project_ids'].map(s).fillna(df['hours'])
print(df)
   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Am I correct that df['project_ids'].map(s) is looking for each project ID from df in the index of s, and returns hours value for this index when found?
@barciewicz - exactly. And if no match, there are NaNs, which are replace by original by .fillna(df['hours'])
1

Another way using df.update():

m=df.set_index('project_ids')
m.update(replacements.set_index('project_ids')['hours'])
print(m.reset_index())

   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

Comments

0

Another solution would be to use pandas.merge and then use fillna:

df_new = pd.merge(df, replacements, on='project_ids', how='left', suffixes=['_1', ''])
df_new['hours'].fillna(df_new['hours_1'], inplace=True)
df_new.drop('hours_1', axis=1, inplace=True)

print(df_new)
   project_ids else   hours
0            1    a   111.0
1            2    b   666.0
2            3    c  1000.0
3            4    d   444.0
4            5    e   999.0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.