Create numpy matrix filled with NaNs

Question

I have the following code:

r = numpy.zeros(shape = (width, height, 9))

It creates a width x height x 9 matrix filled with zeros. Instead, I'd like to know if there's a function or way to initialize them instead to NaNs in an easy way.

One caveat is that NumPy doesn't have an integer NA value (unlike R). See pandas list of gotchas. Hence np.nan goes wrong when converted to int. — smci
– smci, Commented Jul 28, 2013 at 3:31
smci is right. For NumPy there is no such NaN value. So it depends on the type and on NumPy which value will be there for NaN. If you are not aware of this, it will cause troubles — MasterControlProgram
– MasterControlProgram, Commented Nov 4, 2016 at 15:38
It would seem like there is scope for a np.nans function to mimic np.zeros and np.ones in fact, but I suppose np.full is a generalization that precludes the need for all the specialized functions. Nice question. — ClimateUnboxed
– ClimateUnboxed, Commented Jan 21, 2022 at 11:25

Princy · Accepted Answer · 2020-06-10 08:18:39Z

415

You rarely need loops for vector operations in numpy. You can create an uninitialized array and assign to all entries at once:

>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN]])

I have timed the alternatives a[:] = numpy.nan here and a.fill(numpy.nan) as posted by Blaenk:

$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan" 
10000 loops, best of 3: 88.8 usec per loop

The timings show a preference for ndarray.fill(..) as the faster alternative. OTOH, I like numpy's convenience implementation where you can assign values to whole slices at the time, the code's intention is very clear.

Note that ndarray.fill performs its operation in-place, so numpy.empty((3,3,)).fill(numpy.nan) will instead return None.

edited Jun 10, 2020 at 8:18

Princy

3533 silver badges11 bronze badges

answered Nov 10, 2009 at 0:17

u0b34a0f6ae

50k14 gold badges97 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Jorge Israel Peña Over a year ago

I agree that your code's intention is clearer. But thanks for the unbiased timings (or rather, the fact that you still posted them), I appreciate it :)

heltonbiker Over a year ago

I like this one: a = numpy.empty((3, 3,)) * numpy.nan. It timed faster than fill but slower than the assignment method, but it is a oneliner!!

Ivan Over a year ago

Please look at this answer: stackoverflow.com/questions/10871220/…

naught101 Over a year ago

I prefer the .fill() method, but the difference in speeds reduces to practically nothing as the arrays get larger.

flutefreak7 Over a year ago

... because np.empty([2, 5]) creates an array, then fill() modifies that array in-place, but does not return a copy or a reference. If you want to call np.empty(2, 5) by a name ("assign is to a variable"), you have to do so before you do in-place operations on it. Same kinda thing happens if you do [1, 2, 3].insert(1, 4). The list is created and a 4 is inserted, but it is impossible to get a reference to the list (and thus it can be assumed to have been garbage collected). On immutable data like strings, a copy is returned, because you can't operate in-place. Pandas can do both.

|

datu-puti · Accepted Answer · 2018-01-11 03:32:14Z

312

Another option is to use numpy.full, an option available in NumPy 1.8+

a = np.full([height, width, 9], np.nan)

This is pretty flexible and you can fill it with any other number that you want.

edited Jan 11, 2018 at 3:32

datu-puti

1,37515 silver badges35 bronze badges

answered Oct 9, 2014 at 23:50

Pietro Biroli

3,2661 gold badge15 silver badges9 bronze badges

5 Comments

travc Over a year ago

I'd consider this as the most correct answer since it is eactly what full is meant for. np.empy((x,y))*np.nan is a good runner-up (and compatibility for old versions of numpy).

Farnabaz Over a year ago

this is slower that fill

python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)" 100000 loops, best of 3: 13.3 usec per loop python -mtimeit "import numpy as np; a = np.full((100,100), np.nan);" 100000 loops, best of 3: 18.5 usec per loop

Scott Staniewicz Over a year ago

@Farnabaz If you put the equivalent code insiding the timing loop they are about the same. The two methods are basically equal, you've just got the "np.empty" outside the timer in the first one.

python -mtimeit "import numpy as np; a = np.empty((1000,1000)); a.fill(np.nan)" 1000 loops, best of 3: 381 usec per loop   $ python -mtimeit "import numpy as np; a = np.full((1000,1000), np.nan);" 1000 loops, best of 3: 383 usec per loop

CivFan Over a year ago

... what's the 9 for in [height, width, 9]?

SpaceMonkey55 Over a year ago

I think the 9 is from OP; the question had a 3D array of shape (height, width, 9)

Nico Schlömer · Accepted Answer · 2021-10-07 09:42:06Z

89

I compared the suggested alternatives for speed and found that, for large enough vectors/matrices to fill, all alternatives except val * ones and array(n * [val]) are equally fast.

Code to reproduce the plot:

import numpy
import perfplot

val = 42.0


def fill(n):
    a = numpy.empty(n)
    a.fill(val)
    return a


def colon(n):
    a = numpy.empty(n)
    a[:] = val
    return a


def full(n):
    return numpy.full(n, val)


def ones_times(n):
    return val * numpy.ones(n)


def list(n):
    return numpy.array(n * [val])


b = perfplot.bench(
    setup=lambda n: n,
    kernels=[fill, colon, full, ones_times, list],
    n_range=[2 ** k for k in range(20)],
    xlabel="len(a)",
)
b.save("out.png")

edited Oct 7, 2021 at 9:42

answered Jul 10, 2017 at 8:01

Nico Schlömer

59.6k35 gold badges216 silver badges290 bronze badges

2 Comments

endolith Over a year ago

Strange that numpy.full(n, val) is slower than a = numpy.empty(n) .. a.fill(val) since it does the same thing internally

halfmoonhalf Over a year ago

I did the same experient and got roughly the same result as Nico. I use MacBook Pro (16-inch, 2019), 2.6 GHz 6-Core Intel Core i7. Strange that full is slower then fill.

Jorge Israel Peña · Accepted Answer · 2009-11-10 00:39:22Z

28

Are you familiar with numpy.nan?

You can create your own method such as:

def nans(shape, dtype=float):
    a = numpy.empty(shape, dtype)
    a.fill(numpy.nan)
    return a

Then

nans([3,4])

would output

array([[ NaN,  NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN,  NaN]])

I found this code in a mailing list thread.

edited Nov 10, 2009 at 0:39

answered Nov 10, 2009 at 0:16

Jorge Israel Peña

38.9k16 gold badges96 silver badges126 bronze badges

3 Comments

Mad Physicist Over a year ago

Seems like overkill.

Xukrao Over a year ago

@MadPhysicist That depends entirely on your situation. If you have to initialize only one single NaN array, then yes, a custom function is probably overkill. However if you have to initialize a NaN array at dozens of places in your code, then having this function becomes quite convenient.

Mad Physicist Over a year ago

@Xukaro. Not really, given that a more flexible and efficient version of such a function already exists and is mentioned in multiple other answers.

Community · Accepted Answer · 2017-05-23 12:10:31Z

You can always use multiplication if you don't immediately recall the .empty or .full methods:

>>> np.nan * np.ones(shape=(3,2))
array([[ nan,  nan],
       [ nan,  nan],
       [ nan,  nan]])

Of course it works with any other numerical value as well:

>>> 42 * np.ones(shape=(3,2))
array([[ 42,  42],
       [ 42,  42],
       [ 42, 42]])

But the @u0b34a0f6ae's accepted answer is 3x faster (CPU cycles, not brain cycles to remember numpy syntax ;):

$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;"
100000 loops, best of 3: 8.9 usec per loop
(predict)laneh@predict:~/src/predict/predict/webapp$ master
$ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;"
10000 loops, best of 3: 24.9 usec per loop

JHBonarius · Accepted Answer · 2023-03-28 09:57:54Z

10

Yet another possibility not yet mentioned here is to use NumPy tile:

a = numpy.tile(numpy.nan, (3, 3))

Also gives

array([[ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN]])

update: I did a speed comparison, and it's not very fast :/ It's slower than the ones_times by a decimal.

edited Mar 28, 2023 at 9:57

answered Dec 24, 2017 at 10:46

JHBonarius

11.5k3 gold badges28 silver badges53 bronze badges

Comments

Giancarlo Sportelli · Accepted Answer · 2018-11-29 09:15:49Z

8

Another alternative is numpy.broadcast_to(val,n) which returns in constant time regardless of the size and is also the most memory efficient (it returns a view of the repeated element). The caveat is that the returned value is read-only.

Below is a comparison of the performances of all the other methods that have been proposed using the same benchmark as in Nico Schlömer's answer.

answered Nov 29, 2018 at 9:15

Giancarlo Sportelli

1,3071 gold badge17 silver badges22 bronze badges

Comments

Mad Physicist · Accepted Answer · 2016-07-19 16:43:11Z

7

As said, numpy.empty() is the way to go. However, for objects, fill() might not do exactly what you think it does:

In[36]: a = numpy.empty(5,dtype=object)
In[37]: a.fill([])
In[38]: a
Out[38]: array([[], [], [], [], []], dtype=object)
In[39]: a[0].append(4)
In[40]: a
Out[40]: array([[4], [4], [4], [4], [4]], dtype=object)

One way around can be e.g.:

In[41]: a = numpy.empty(5,dtype=object)
In[42]: a[:]= [ [] for x in range(5)]
In[43]: a[0].append(4)
In[44]: a
Out[44]: array([[4], [], [], [], []], dtype=object)

edited Jul 19, 2016 at 16:43

Mad Physicist

116k29 gold badges201 silver badges291 bronze badges

answered May 27, 2015 at 9:31

ntg

14.4k10 gold badges84 silver badges107 bronze badges

2 Comments

Mad Physicist Over a year ago

Aside from having virtually nothing to do with the original question, neat.

ntg Over a year ago

Well, It's about "Initializing numpy matrix to something other than zero or one", in the case "something other" is an object :) (More practically, google led me here for initializing with an empty list )

Novorodnas · Accepted Answer · 2022-01-17 17:38:10Z

Just a warning that initializing with np.empty() without subsequently editing the values can lead to (memory allocation?) problems:

arr1 = np.empty(96)
arr2 = np.empty(96)
print(arr1)
print(arr2)

# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan  1.  1.
#   1.  1.  2.  2.  2.  2. nan nan nan nan nan nan nan nan  0.  0.  0.  0.
#   0.  0.  0.  0. nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan]
#
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan  1.  1.
#   1.  1.  2.  2.  2.  2. nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan]

The floats initialized in the array are used somewhere else in my script but are not associated with variables arr1 or arr2 at all. Spooky.

Answer from user @JHBonarius solved this problem:

arr = np.tile(np.nan, 96)
print(arr)

# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan]

np.empty allocates memory for the array without overwriting the existing data in those bytes, so it will contain arbitrary (not random) data.

user19409802 · Accepted Answer · 2022-06-24 17:09:13Z

4

>>> width = 2
>>> height = 3

>>> r = np.full((width, height, 9), np.nan)

>>> print(r)

array([[[nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan]]])

>>> r.shape
(2, 3, 9)

answered Jun 24, 2022 at 17:09

user19409802

491 bronze badge

2 Comments

user19409802 Over a year ago

late to the show, but I will be looking for this again in 5 years...

CivFan Over a year ago

... what's the 9 for in [height, width, 9]?

Larry Panozzo · Accepted Answer · 2023-01-03 04:29:03Z

3

Pardon my tardiness, but here is the fastest solution for large arrays, iff single-precision (f4 float32) is all you need. And yes, np.nan works as well.

def full_single_prec(n):
    return numpy.full(n, val, dtype='f4')

edited Jan 3, 2023 at 4:29

answered Jan 3, 2023 at 4:23

Larry Panozzo

3303 silver badges8 bronze badges

Collectives™ on Stack Overflow

Create numpy matrix filled with NaNs

11 Answers 11

9 Comments

5 Comments

2 Comments

3 Comments

Comments

Comments

Comments

2 Comments

1 Comment

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

9 Comments

5 Comments

2 Comments

3 Comments

Comments

Comments

Comments

2 Comments

1 Comment

2 Comments

Comments

Linked

Related