2

I am trying to remove rows from a DataFrame that contain null values within numpy array

DataFrame:

name    array   
A       [nan, nan, nan] 
B       [111.425818592, -743.060293425, -180.420675659] 

Expected output

name    array   
B       [111.425818592, -743.060293425, -180.420675659] 

My attempt:

df = df[df['array'].apply(lambda x: np.where(~np.isnan(x)))]

Error i am getting is:

TypeError: unhashable type: 'numpy.ndarray'

3 Answers 3

2

Data from jpp

df[~pd.DataFrame(df.array.tolist()).isnull().all(1)]
Out[391]: 
  name                                            array
1    B  [111.425818592, -743.060293425, -180.420675659]
Sign up to request clarification or add additional context in comments.

Comments

0

Here is one way:

import pandas as pd, numpy as np

df = pd.DataFrame([['A', np.array([np.nan, np.nan, np.nan])],
                   ['B', np.array([111.425818592, -743.060293425, -180.420675659])]],
                  columns=['name', 'array'])

df = df[~np.all(list(map(np.isnan, df['array'])), axis=1)]

#   name                                            array
# 1    B  [111.425818592, -743.060293425, -180.420675659]

Or, if you want to remove rows where any values of the array are NaN:

df = df[~np.any(list(map(np.isnan, df['array'])), axis=1)]

2 Comments

hmm.. i am getting this error: AxisError: axis 1 is out of bounds for array of dimension 1
OK, unfortunately I can't replicate your error. I can't see the types of your dataframe objects. The code above with the input data as I've created seems to work.
0

You really should consider dropping the use of numpy arrays within dataframe columns, every operation you do on the series is going to be a heartache. Instead just convert into a dataframe and then use pandas functionaities

dfnew = pd.DataFrame(np.concatenate([df.name.values.reshape(-1,1),   
                     np.array(df.array.tolist())],axis=1),
                     columns['name','array1','array2','array3'])

  name   array1  array2   array3
0    A      NaN     NaN      NaN
1    B  111.426 -743.06 -180.421

Now you can use dropna()

dfnew.dropna(axis=0)

  name   array1  array2   array3
1    B  111.426 -743.06 -180.421

You can than always extract a single array if need be by

dfnew.iloc[1,1:].values

array([111.425818592, -743.060293425, -180.420675659], dtype=object)

1 Comment

@ DJK, thanks for the tip! i'm working with numpy poly1d objects so that's why i'm not keen on breaking it up to individual columns. Yes, its a pain to work with numpy arrays with dfs

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.