2

Having two dataframes where one of them has some value to be replaced in the other. What is the best way to replace the values?

For instance, the type:none in df1 should be replaced with the value in df2. This is the progress I have done so far, but I am not content with this approach:

df1=pd.DataFrame({"word":['The','big','cat','house'], "type": ['article','none','noun','none'],"pos":[1,2,3,4]})
df2=pd.DataFrame({"word":['big','house'], "type": ['adjective','noun'],"pos":[2,4]})

df1.set_index('pos',inplace=True, drop=True)
df2.set_index('pos',inplace=True, drop=True)

for i, row in df1.iterrows():
    if row['type']=='none':
        row['word']=df2.loc[df2.index[i],'word']

df1 dataframe should change to:

   word   type         pos 
0 The      article       1
1 big       adjective  2
2 cat       noun         3
3 house  noun        4

Thanks :)

2
  • Check out my updated answer, i think it might be what your looking for Commented Dec 5, 2019 at 0:30
  • Don’t use .iterrows(). Commented Dec 5, 2019 at 3:19

3 Answers 3

1

If df2 always indicate the position of where the words in df1 should be replaced, you can simply do:

df1.loc[df2.index,"type"] = df2["type"]

print (df1)

#
      word       type
pos                  
1      The    article
2      big  adjective
3      cat       noun
4    house       noun
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Henry, I was looking for something like this, the simplest approach.
1

Solution

Without any use of .apply() method.

condition = df1['type']=='none'
df1.loc[condition, 'type'] = df2.loc[condition]['type']
df1.reset_index(inplace=True)

Output:

   pos   word       type
0    1    The    article
1    2    big  adjective
2    3    cat       noun
3    4  house       noun

2 Comments

@JuanPerez Please try it out and leave a comment if it worked.
Thanks @CypherX, it worked as a charm. Thank you :)
1

How about:

df= df2.set_index('word').combine_first(df1.set_index('word')) 
df.pos = df.pos.astype(int)

output:

            type  pos
word                 
The      article  1
big    adjective  2
cat         noun  3
house       noun  4

and

df.reset_index()
In [970]: df.reset_index()                                                                                                                                                                                 
Out[970]: 
    word       type  pos
0    The    article    1
1    big  adjective    2
2    cat       noun    3
3  house       noun    4

or by 'pos':

df = df2.set_index('pos').combine_first(df1.set_index('pos')).reset_index()
colidx=['word', 'type', 'pos']   
df.reindex(columns=colidx)

output:

Out[976]: 
    word       type  pos
0    The    article    1
1    big  adjective    2
2    cat       noun    3
3  house       noun    4

2 Comments

I would prefer to set index to position because it would be more than one word that is repeated, so the position can differenciate these cases of the same word in dataframe.
@JuanPerez I changed it to that at the end of the answer

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.