1

I have a dataframe:

import pandas as pd
data = {'fruit': ['pear','pear','banana', 'pear', 'pear','apple', 'apple', 'cherry','cherry'],
        'fruit_type': ['unknown','pear','unknown', 'unknown', 'pear','unknown', 'apple', 'cherry','unknown'],
'country': ['unknown','usa', 'unknown', 'unknown','ghana','unknown', 'russia', 'albania','unknown'],
'id': ['011','011','011', '011', '011','011', '011', '6','6'],
'month': ['unknown','march', 'unknown', 'unknown', 'january','unknown', 'march', 'january','unknown']       
}
df = pd.DataFrame(data, columns = ['fruit','fruit_type','country', 'id', 'month'])

enter image description here

I want to fill rows where is unknown with value from another row for each group by id:

If we have an unknown value in the month column in the first place in the group by id we need to insert unknown values from the next row

If an unknown value in the month column not in the first place in the group by id we need to insert unknown values from the previous row

Can anyone see the problem?

Output dataframe:

enter image description here

1
  • 1
    what have you tried so far? Commented Jul 26, 2021 at 9:27

2 Answers 2

2

use replace() for replacing 'unknown' to NaN then groupby 'id' and then forward fill and then backword fill and filnally assign the result back to df:

df=df.replace('unknown',float('nan'))
#If above replace doesn't work then use:
#df=df.replace('unknown',float('nan'),regex=True)
df=df.groupby('id').apply(lambda x:x.ffill().bfill())

output of df:

   fruit    fruit_type  country     id      month
0   pear    pear        usa         011     march
1   pear    pear        usa         011     march
2   banana  pear        usa         011     march
3   pear    pear        usa         011     march
4   pear    pear        ghana       011     january
5   apple   pear        ghana       011     january
6   apple   apple       russia      011     march
7   cherry  cherry      albania     6       january
8   cherry  cherry      albania     6       january
Sign up to request clarification or add additional context in comments.

2 Comments

@jezrael how sir?pls tell!!
seems OK, sorry.
2

Replace unknown to missing values and then forward and backward missing values per groups:

f = lambda x: x.ffill().bfill()
df = df.replace('unknown', np.nan).groupby(df['id']).transform(f)
print (df)
    fruit fruit_type  country   id    month
0    pear       pear      usa  011    march
1    pear       pear      usa  011    march
2  banana       pear      usa  011    march
3    pear       pear      usa  011    march
4    pear       pear    ghana  011  january
5   apple       pear    ghana  011  january
6   apple      apple   russia  011    march
7  cherry     cherry  albania    6  january
8  cherry     cherry  albania    6  january

Comments