Remove rows from DataFrame that contain null values within numpy array

Question

I am trying to remove rows from a DataFrame that contain null values within numpy array

DataFrame:

name    array   
A       [nan, nan, nan] 
B       [111.425818592, -743.060293425, -180.420675659]

Expected output

name    array   
B       [111.425818592, -743.060293425, -180.420675659]

My attempt:

df = df[df['array'].apply(lambda x: np.where(~np.isnan(x)))]

Error i am getting is:

TypeError: unhashable type: 'numpy.ndarray'

BENY · Accepted Answer · 2018-02-28 17:15:01Z

2

Data from jpp

df[~pd.DataFrame(df.array.tolist()).isnull().all(1)]
Out[391]: 
  name                                            array
1    B  [111.425818592, -743.060293425, -180.420675659]

answered Feb 28, 2018 at 17:15

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jpp · Accepted Answer · 2018-02-28 17:20:59Z

0

Here is one way:

import pandas as pd, numpy as np

df = pd.DataFrame([['A', np.array([np.nan, np.nan, np.nan])],
                   ['B', np.array([111.425818592, -743.060293425, -180.420675659])]],
                  columns=['name', 'array'])

df = df[~np.all(list(map(np.isnan, df['array'])), axis=1)]

#   name                                            array
# 1    B  [111.425818592, -743.060293425, -180.420675659]

Or, if you want to remove rows where any values of the array are NaN:

df = df[~np.any(list(map(np.isnan, df['array'])), axis=1)]

edited Feb 28, 2018 at 17:20

answered Feb 28, 2018 at 17:11

jpp

166k37 gold badges301 silver badges362 bronze badges

2 Comments

doyz Over a year ago

hmm.. i am getting this error: AxisError: axis 1 is out of bounds for array of dimension 1

jpp Over a year ago

OK, unfortunately I can't replicate your error. I can't see the types of your dataframe objects. The code above with the input data as I've created seems to work.

DJK · Accepted Answer · 2018-02-28 18:50:30Z

You really should consider dropping the use of numpy arrays within dataframe columns, every operation you do on the series is going to be a heartache. Instead just convert into a dataframe and then use pandas functionaities

dfnew = pd.DataFrame(np.concatenate([df.name.values.reshape(-1,1),   
                     np.array(df.array.tolist())],axis=1),
                     columns['name','array1','array2','array3'])

  name   array1  array2   array3
0    A      NaN     NaN      NaN
1    B  111.426 -743.06 -180.421

Now you can use dropna()

dfnew.dropna(axis=0)

  name   array1  array2   array3
1    B  111.426 -743.06 -180.421

You can than always extract a single array if need be by

dfnew.iloc[1,1:].values

array([111.425818592, -743.060293425, -180.420675659], dtype=object)

@ DJK, thanks for the tip! i'm working with numpy poly1d objects so that's why i'm not keen on breaking it up to individual columns. Yes, its a pain to work with numpy arrays with dfs

Collectives™ on Stack Overflow

Remove rows from DataFrame that contain null values within numpy array

3 Answers 3

Comments

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

1 Comment

Related