pandas how to insert value from another row

Question

I have a dataframe:

import pandas as pd
data = {'fruit': ['pear','pear','banana', 'pear', 'pear','apple', 'apple', 'cherry','cherry'],
        'fruit_type': ['unknown','pear','unknown', 'unknown', 'pear','unknown', 'apple', 'cherry','unknown'],
'country': ['unknown','usa', 'unknown', 'unknown','ghana','unknown', 'russia', 'albania','unknown'],
'id': ['011','011','011', '011', '011','011', '011', '6','6'],
'month': ['unknown','march', 'unknown', 'unknown', 'january','unknown', 'march', 'january','unknown']       
}
df = pd.DataFrame(data, columns = ['fruit','fruit_type','country', 'id', 'month'])

I want to fill rows where is unknown with value from another row for each group by id:

If we have an unknown value in the month column in the first place in the group by id we need to insert unknown values from the next row

If an unknown value in the month column not in the first place in the group by id we need to insert unknown values from the previous row

Can anyone see the problem?

Output dataframe:

what have you tried so far?

alec_djinn
– alec_djinn

2021-07-26 09:27:13 +00:00
Commented Jul 26, 2021 at 9:27 — alec_djinn
– alec_djinn, Commented Jul 26, 2021 at 9:27

Anurag Dabas · Accepted Answer · 2021-07-26 09:47:26Z

use replace() for replacing 'unknown' to NaN then groupby 'id' and then forward fill and then backword fill and filnally assign the result back to df:

df=df.replace('unknown',float('nan'))
#If above replace doesn't work then use:
#df=df.replace('unknown',float('nan'),regex=True)
df=df.groupby('id').apply(lambda x:x.ffill().bfill())

output of df:

   fruit    fruit_type  country     id      month
0   pear    pear        usa         011     march
1   pear    pear        usa         011     march
2   banana  pear        usa         011     march
3   pear    pear        usa         011     march
4   pear    pear        ghana       011     january
5   apple   pear        ghana       011     january
6   apple   apple       russia      011     march
7   cherry  cherry      albania     6       january
8   cherry  cherry      albania     6       january

jezrael · Accepted Answer · 2021-07-26 09:38:34Z

Replace unknown to missing values and then forward and backward missing values per groups:

f = lambda x: x.ffill().bfill()
df = df.replace('unknown', np.nan).groupby(df['id']).transform(f)
print (df)
    fruit fruit_type  country   id    month
0    pear       pear      usa  011    march
1    pear       pear      usa  011    march
2  banana       pear      usa  011    march
3    pear       pear      usa  011    march
4    pear       pear    ghana  011  january
5   apple       pear    ghana  011  january
6   apple      apple   russia  011    march
7  cherry     cherry  albania    6  january
8  cherry     cherry  albania    6  january

Collectives™ on Stack Overflow

pandas how to insert value from another row

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related