0

I am having a dataframe df as follows:

ID  IndentNo    PO_Ref_No
 1  10023       470089AB
 2  10023       470089DC
 3  10023   
 4  10024       674005TT
 5  10024       674005LP
 6  10024       674005TN

Objective: I want to drop the entire row against IndentNo= 10024 because it has got the PO_Ref_No for all 3 rows.

So the Resultant df would be like :

ID  IndentNo    PO_Ref_No
 1  10023       470089AB
 2  10023       470089DC
 3  10023       

Is there any clue on how to do the same efficiently? If I use below:

df['Flag'] = np.where(pd.isnull(df['PO_Ref_No']),1,0)
df = df.loc[df['Flag']!=1]

But this would take away ID number 3 of IndentNo 10023.

Any clue would be helpful.

1
  • df[df['IndentNo'].isin(df[df['PO_Ref_No'].isna()]['IndentNo'].unique())] should work. Commented Aug 25, 2021 at 10:14

1 Answer 1

3

Solution is for how to discard all rows if any of the row item is not null:

You can check missing values and test if per groups at least one has NaN by Series.isna with GroupBy.transform and GroupBy.any:

df = df[df['PO_Ref_No'].isna().groupby(df['IndentNo']).transform('any')]
print (df)
   ID  IndentNo PO_Ref_No
0   1     10023  470089AB
1   2     10023  470089DC
2   3     10023       NaN

Or get all groups with NaNs by filtering by isna and then filter original column IndentNo by Series.isin for membership:

df = df[df['IndentNo'].isin(df.loc[df['PO_Ref_No'].isna(), 'IndentNo'])]
print (df)
   ID  IndentNo PO_Ref_No
0   1     10023  470089AB
1   2     10023  470089DC
2   3     10023       NaN

Slow, but possible is use DataFrameGroupBy.filter:

df = df.groupby('IndentNo').filter(lambda x: x['PO_Ref_No'].isna().any())
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.