5

I am graphing several columns of a large array of data (through numpy.genfromtxt) against an equally sized time column. Missing data is often referred to as nan, -999, -9999, etc. However I can't figure out how to remove multiple values from the array. This is what I currently have:

for cur_col in range(start_col, total_col):
    # Generate what is to be graphed by removing nan values
    data_mask = (file_data[:, cur_col] != nan_values)
    y_data = file_data[:, cur_col][data_mask]
    x_data = file_data[:, time_col][data_mask]

After which point I use matplotlib to create the appropriate figures for each column. This works fine if the nan_values is a single integer, but I am looking to use a list.

EDIT: Here is a working example.

import numpy as np

file_data = np.arange(12.0).reshape((4,3))
file_data[1,1] = np.nan
file_data[2,2] = -999
nan_values = -999

for cur_col in range(1,3):
    # Generate what is to be graphed by removing nan values
    data_mask = (file_data[:, cur_col] != nan_values)
    y_data = file_data[:, cur_col][data_mask]
    x_data = file_data[:, 0][data_mask]
    print 'y: ' + str(y_data)
    print 'x: ' + str(x_data)
print file_data

>>> y: [  1.  nan   7.  10.]
    x: [ 0.  3.  6.  9.]
    y: [  2.   5.  11.]
    x: [ 0.  3.  9.]
    [[   0.    1.    2.]
    [   3.   nan    5.]
    [   6.    7. -999.]
    [   9.   10.   11.]]

This will not work if nan_values = ['nan', -999] which is what I am looking to accomplish.

2
  • please post the sample array(list). Commented Jun 21, 2012 at 20:39
  • @AshwiniChaudhary I've edited the question to include a working example. Commented Jun 21, 2012 at 21:47

2 Answers 2

7

I would suggest using masked arrays like so:

>>> a = np.arange(12.0).reshape((4,3))
>>> a[1,1] = np.nan
>>> a[2,2] = -999
>>> a
array([[   0.,    1.,    2.],
       [   3.,   nan,    5.],
       [   6.,    7., -999.],
       [   9.,   10.,   11.]])
>>> m = np.ma.array(a,mask=(~np.isfinite(a) | (a == -999)))
>>> m
masked_array(data =
 [[0.0 1.0 2.0]
 [3.0 -- 5.0]
 [6.0 7.0 --]
 [9.0 10.0 11.0]],
             mask =
 [[False False False]
 [False  True False]
 [False False  True]
 [False False False]],
       fill_value = 1e+20)
Sign up to request clarification or add additional context in comments.

4 Comments

While the results are what I need, it doesn't use a list which would greatly streamline what I am doing. Is there a way to replace the or statements with a list for the mask= in the ma.array?
mask=np.logical_or.reduce([a == value for value in [-99,-999,-9999]]). Be aware though that np.nan != np.nan, so you'll have to add that to the mask explicitly.
@user545424 be very careful with that snippet. if a is big and you have a list of values you're checking for that's a decent length your memory needs will explode. [edit: i'm talking about the snippet: mask=np.logical_or.reduce([a == value for value in [-99,-999,-9999]]) ]
Is this robust with floats? The scipy docs say that using masked_values is recommended when masking on the basis of floating point values.
2

I would try something like (pseudo-code):

nan_values = [...]

for cur_col in range(start_col, total_col):
    # Generate what is to be graphed by removing nan values
    y_data = [file_data[i,cur_col] for i in range(len(file_data)) if not(file_data[i,cur_col] in nan_values)]
    x_data = [file_data[i,time_col] for i in range(len(file_data)) if not(file_data[i,cur_col] in nan_values)]

1 Comment

I am not able to implement this example into the working one I recently added. I receive 'argument of type 'int' is not iterable'

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.