Creating a masked array in Python with multiple given values

Question

I am graphing several columns of a large array of data (through numpy.genfromtxt) against an equally sized time column. Missing data is often referred to as nan, -999, -9999, etc. However I can't figure out how to remove multiple values from the array. This is what I currently have:

for cur_col in range(start_col, total_col):
    # Generate what is to be graphed by removing nan values
    data_mask = (file_data[:, cur_col] != nan_values)
    y_data = file_data[:, cur_col][data_mask]
    x_data = file_data[:, time_col][data_mask]

After which point I use matplotlib to create the appropriate figures for each column. This works fine if the nan_values is a single integer, but I am looking to use a list.

EDIT: Here is a working example.

import numpy as np

file_data = np.arange(12.0).reshape((4,3))
file_data[1,1] = np.nan
file_data[2,2] = -999
nan_values = -999

for cur_col in range(1,3):
    # Generate what is to be graphed by removing nan values
    data_mask = (file_data[:, cur_col] != nan_values)
    y_data = file_data[:, cur_col][data_mask]
    x_data = file_data[:, 0][data_mask]
    print 'y: ' + str(y_data)
    print 'x: ' + str(x_data)
print file_data

>>> y: [  1.  nan   7.  10.]
    x: [ 0.  3.  6.  9.]
    y: [  2.   5.  11.]
    x: [ 0.  3.  9.]
    [[   0.    1.    2.]
    [   3.   nan    5.]
    [   6.    7. -999.]
    [   9.   10.   11.]]

This will not work if nan_values = ['nan', -999] which is what I am looking to accomplish.

@AshwiniChaudhary I've edited the question to include a working example. — Josiah
– Josiah, Commented Jun 21, 2012 at 21:47

user545424 · Accepted Answer · 2012-06-21 21:13:30Z

7

I would suggest using masked arrays like so:

>>> a = np.arange(12.0).reshape((4,3))
>>> a[1,1] = np.nan
>>> a[2,2] = -999
>>> a
array([[   0.,    1.,    2.],
       [   3.,   nan,    5.],
       [   6.,    7., -999.],
       [   9.,   10.,   11.]])
>>> m = np.ma.array(a,mask=(~np.isfinite(a) | (a == -999)))
>>> m
masked_array(data =
 [[0.0 1.0 2.0]
 [3.0 -- 5.0]
 [6.0 7.0 --]
 [9.0 10.0 11.0]],
             mask =
 [[False False False]
 [False  True False]
 [False False  True]
 [False False False]],
       fill_value = 1e+20)

answered Jun 21, 2012 at 21:13

user545424

16.3k11 gold badges61 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Josiah Over a year ago

While the results are what I need, it doesn't use a list which would greatly streamline what I am doing. Is there a way to replace the or statements with a list for the mask= in the ma.array?

user545424 Over a year ago

mask=np.logical_or.reduce([a == value for value in [-99,-999,-9999]]). Be aware though that np.nan != np.nan, so you'll have to add that to the mask explicitly.

user1269942 Over a year ago

@user545424 be very careful with that snippet. if a is big and you have a list of values you're checking for that's a decent length your memory needs will explode. [edit: i'm talking about the snippet: mask=np.logical_or.reduce([a == value for value in [-99,-999,-9999]]) ]

alphabetasoup Over a year ago

Is this robust with floats? The scipy docs say that using masked_values is recommended when masking on the basis of floating point values.

GL770 · Accepted Answer · 2012-06-21 20:44:59Z

2

I would try something like (pseudo-code):

nan_values = [...]

for cur_col in range(start_col, total_col):
    # Generate what is to be graphed by removing nan values
    y_data = [file_data[i,cur_col] for i in range(len(file_data)) if not(file_data[i,cur_col] in nan_values)]
    x_data = [file_data[i,time_col] for i in range(len(file_data)) if not(file_data[i,cur_col] in nan_values)]

edited Jun 21, 2012 at 20:44

answered Jun 21, 2012 at 20:37

GL770

3,2601 gold badge16 silver badges9 bronze badges

1 Comment

Josiah Over a year ago

I am not able to implement this example into the working one I recently added. I receive 'argument of type 'int' is not iterable'

Collectives™ on Stack Overflow

Creating a masked array in Python with multiple given values

2 Answers 2

4 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Related