1

Hello: I have the following code that gives me a count of the number of null values in a column:

df_null = df.columns[df.isnull().any()]

df[df_null].isnull().sum()

The result is an index with the column name and number of null values:

col1 10  
col2 20  
col3 30  

What I want to do is drop all the rows/records in columns that have less than 15 null values. I have gone through the columns manually and dropped the rows/records using the following:

df.dropna(subset=['col_name'], axis=0, inplace=True)

That works fine. But what I would like to do is automate the process so I don't have to manually go through each column and drop the null rows/records manually.

Thank you.

1
  • Thank you for sharing your efforts in form code, could you please post samples of input and expected output in your question and let us know then. That will give us more clarity on question cheers Commented Sep 7, 2020 at 21:16

2 Answers 2

2

Check with

s = df.isnull().sum()
dfnew = df.loc[:, (s>15)|(s==0)]
# the first condition will keep column with more than 15 null, then second , will keep all column without have NaN
Sign up to request clarification or add additional context in comments.

Comments

2

Another way

Keep only the columns with at least n non-NaN values

n=len(df)-15


 df.dropna(thresh=n, axis=1)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.