How do I remove all zero elements from a NumPy array?

Question

I have a rank-1 numpy.array of which I want to make a boxplot. However, I want to exclude all values equal to zero in the array. Currently, I solved this by looping the array and copy the value to a new array if not equal to zero. However, as the array consists of 86 000 000 values and I have to do this multiple times, this takes a lot of patience.

Is there a more intelligent way to do this?

Sven Marnach · Accepted Answer · 2011-05-08 11:50:02Z

143

For a NumPy array a, you can use

a[a != 0]

to extract the values not equal to zero.

answered May 8, 2011 at 11:50

Sven Marnach

607k123 gold badges966 silver badges865 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ruben baetens Over a year ago

Thank you very much, this works indeed much (!) more faster. Does similar action ca be done on higher rank NumMpy array or matrix ? Because here, the problem occurs that dimenions will no longer match properly ...

Sven Marnach Over a year ago

@rubae: If a has higher dimension, the result will be a flattened (one dimensional) array. It would also be possible to remove columns or rows that are all zero.

noumenal Over a year ago

...where a is a np.array. This will not work on built-in python arrays.

nbro · Accepted Answer · 2019-08-10 14:21:11Z

41

This is a case where you want to use masked arrays, it keeps the shape of your array and it is automatically recognized by all numpy and matplotlib functions.

X = np.random.randn(1e3, 5)
X[np.abs(X)< .1]= 0 # some zeros
X = np.ma.masked_equal(X,0)
plt.boxplot(X) #masked values are not plotted

#other functionalities of masked arrays
X.compressed() # get normal array with masked values removed
X.mask # get a boolean array of the mask
X.mean() # it automatically discards masked values

edited Aug 10, 2019 at 14:21

nbro

16k34 gold badges122 silver badges219 bronze badges

answered May 9, 2011 at 18:34

Andrea Zonca

8,81310 gold badges46 silver badges73 bronze badges

1 Comment

Andrea Zonca Over a year ago

link to documentation: docs.scipy.org/doc/numpy/reference/routines.ma.html

MSeifert · Accepted Answer · 2019-08-10 14:44:56Z

I decided to compare the runtime of the different approaches mentioned here. I've used my library simple_benchmark for this.

The boolean indexing with array[array != 0] seems to be the fastest (and shortest) solution.

For smaller arrays the MaskedArray approach is very slow compared to the other approaches however is as fast as the boolean indexing approach. However for moderately sized arrays there is not much difference between them.

Here is the code I've used:

from simple_benchmark import BenchmarkBuilder

import numpy as np

bench = BenchmarkBuilder()

@bench.add_function()
def boolean_indexing(arr):
    return arr[arr != 0]

@bench.add_function()
def integer_indexing_nonzero(arr):
    return arr[np.nonzero(arr)]

@bench.add_function()
def integer_indexing_where(arr):
    return arr[np.where(arr != 0)]

@bench.add_function()
def masked_array(arr):
    return np.ma.masked_equal(arr, 0)

@bench.add_arguments('array size')
def argument_provider():
    for exp in range(3, 25):
        size = 2**exp
        arr = np.random.random(size)
        arr[arr < 0.1] = 0  # add some zeros
        yield size, arr

r = bench.run()
r.plot()

! The bench for masked_array is built incorrectly: np.ma.masked_equal(arr, 0) does not return a filtered array. It should be m = np.ma.masked_equal(arr, 0); return arr[~m.mask]

jpp · Accepted Answer · 2019-01-02 00:30:01Z

5

You can index with a Boolean array. For a NumPy array A:

res = A[A != 0]

You can use Boolean array indexing as above, bool type conversion, np.nonzero, or np.where. Here's some performance benchmarking:

# Python 3.7, NumPy 1.14.3

np.random.seed(0)

A = np.random.randint(0, 5, 10**8)

%timeit A[A != 0]          # 768 ms
%timeit A[A.astype(bool)]  # 781 ms
%timeit A[np.nonzero(A)]   # 1.49 s
%timeit A[np.where(A)]     # 1.58 s

answered Jan 2, 2019 at 0:30

jpp

166k37 gold badges301 silver badges362 bronze badges

Comments

eat · Accepted Answer · 2011-05-08 15:49:20Z

4

I would like to suggest you to simply utilize NaN for cases like this, where you'll like to ignore some values, but still want to keep the procedure statistical as meaningful as possible. So

In []: X= randn(1e3, 5)
In []: X[abs(X)< .1]= NaN
In []: isnan(X).sum(0)
Out[: array([82, 84, 71, 81, 73])
In []: boxplot(X)

enter image description here

answered May 8, 2011 at 15:49

eat

7,5301 gold badge21 silver badges28 bronze badges

3 Comments

ruben baetens Over a year ago

ah, the use of NaN seems indeed more appropriate here, thank you. As such i no longer need to copy my data to a new array with different sizing but i can keep the original array and as such location in the array. Thank you !

ruben baetens Over a year ago

do you perhaps know a manner to loop this using list comprehension ? i.e. i'm having a dictionary a where a[k] is a NumPy array so i wanted to do [a[k][abs(a[k])<.1]=float('NaN') for k in data] but this seems to fail in the loop, whereas only executing the command in the loop seems to work ...

eat Over a year ago

@rubae: I think you should make a separate question related to this list comprehension issue. Unfortunately it's not anymore so straightforward to figure out what you are actually aiming for :(. As far as I can guess; don't get fooled out with the list comprehension, perhaps you are only looking for something simple like this: for k in data: a[k][abs(a[k])< .1]= NaN?

David Guest · Accepted Answer · 2018-02-19 09:44:07Z

4

A simple line of code can get you an array that excludes all '0' values:

np.argwhere(*array*)

example:

import numpy as np
array = [0, 1, 0, 3, 4, 5, 0]
array2 = np.argwhere(array)
print array2

[1, 3, 4, 5]

answered Feb 19, 2018 at 9:44

David Guest

991 silver badge2 bronze badges

3 Comments

Jamil Hneini Over a year ago

np.argwhere returns the indexes of the nonzero elements only

mdhansen Over a year ago

So this array, by sheer luck, happens to appear to satisfy the question, but is misleading. From the result of argwhere you could reconstitute the non-zero array, but it's an additional step.

KASMI G. Over a year ago

Actually np.argwhere() doesn't return the list with non zeros excluded, it return a list of indices of the non zeros elements

Shrm · Accepted Answer · 2021-05-06 14:11:17Z

1

[i for i in Array if i != 0.0] if the numbers are float or [i for i in SICER if i != 0] if the numbers are int.

answered May 6, 2021 at 14:11

Shrm

4534 silver badges9 bronze badges

1 Comment

Matt Over a year ago

your solution will likely be less efficient than numpy, to handle both types at once you could do [i for i in Array if i > 0]

Collectives™ on Stack Overflow

How do I remove all zero elements from a NumPy array?

7 Answers 7

3 Comments

1 Comment

1 Comment

Comments

3 Comments

3 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

1 Comment

1 Comment

Comments

3 Comments

3 Comments

1 Comment

Linked

Related