Numpy reading data from '.npy' file directly into arrays

Question

This might be a silly question, but I can't seem to find an answer for it. I have a large array that I've previously saved using np.save, and now I'd like to load the data into a new file, creating a separate list from each column. The only issue is that some of the rows in my large array only have a single nan value, so the array looks something like this (as an extremely simplified example):

np.array([[5,12,3], 
          [nan], 
          [10,13,9],
          [nan],
          [nan]])

I can use a for loop to achieve what I want, but I was wondering if there was a better way than this:

import numpy as np

results = np.load('data.npy')
depth, upper, lower = [], [], []

for item in results:
    if len(item) > 1:
        depth.append(item[0])
        upper.append(item[1])
        lower.append(item[2])
    else:
        depth.append(np.nan)
        upper.append(np.nan)
        lower.append(np.nan)

My desired output would look like:

depth = [5,nan,10,nan,nan]
upper = [12,nan,13,nan,nan]
lower = [3,nan,9,nan,nan]

Thanks for your help! I realize I should have previously altered the code that creates the "data.npy" file, so that it has the same number of columns for each row, but that code already takes hours to run and I'd rather avoid that!

hpaulj · Accepted Answer · 2016-07-19 22:12:09Z

With varying length sub arrays, this is dtype=object array. For most purposes this is the same as a list of these subarrays. So most actions will require iteration.

A variant on your action would be a list comprehension

In [61]: dd=[[nan,nan,nan] if len(i)==1 else i for i in d]

In [62]: dd
Out[62]: [[5, 12, 3], [nan, nan, nan], [10, 13, 9], [nan, nan, nan], [nan, nan, nan]]

Your three target arrays are then columns of:

In [63]: np.array(dd)
Out[63]: 
array([[  5.,  12.,   3.],
       [ nan,  nan,  nan],
       [ 10.,  13.,   9.],
       [ nan,  nan,  nan],
       [ nan,  nan,  nan]])

Another approach is to make an array of that type filled with nan, and then copy over the non-nan values. But that too requires iteration to find the length of the subsarrays.

In [65]: [len(i)>1 for i in d]
Out[65]: [True, False, True, False, False]

np.nan is a float, so a 2d array with nan will be dtype float.

Thanks! I used your first suggestion and then transposed dd so that depth = dd[0] and so on. Much cleaner-looking than all that appending I was doing.

nico · Accepted Answer · 2016-07-20 10:29:43Z

A shorter way using pandas:

import numpy as np
import pandas as pd

data = np.array([[5,12,3], [np.nan], [10,13,9], [np.nan], [np.nan]])
df = pd.DataFrame.from_records(data.tolist())
df.columns = ['depth','upper','lower']

Output:

>>> df
   depth  upper  lower
0    5.0   12.0    3.0
1    NaN    NaN    NaN
2   10.0   13.0    9.0
3    NaN    NaN    NaN
4    NaN    NaN    NaN

You can now address each column to get your desired output

>>> df.depth
0     5.0
1     NaN
2    10.0
3     NaN
4     NaN

If you need lists:

>>> df.depth.tolist()
[5.0, nan, 10.0, nan, nan]

Collectives™ on Stack Overflow

Numpy reading data from '.npy' file directly into arrays

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related