0

This might be a silly question, but I can't seem to find an answer for it. I have a large array that I've previously saved using np.save, and now I'd like to load the data into a new file, creating a separate list from each column. The only issue is that some of the rows in my large array only have a single nan value, so the array looks something like this (as an extremely simplified example):

np.array([[5,12,3], 
          [nan], 
          [10,13,9],
          [nan],
          [nan]])

I can use a for loop to achieve what I want, but I was wondering if there was a better way than this:

import numpy as np

results = np.load('data.npy')
depth, upper, lower = [], [], []

for item in results:
    if len(item) > 1:
        depth.append(item[0])
        upper.append(item[1])
        lower.append(item[2])
    else:
        depth.append(np.nan)
        upper.append(np.nan)
        lower.append(np.nan)

My desired output would look like:

depth = [5,nan,10,nan,nan]
upper = [12,nan,13,nan,nan]
lower = [3,nan,9,nan,nan]

Thanks for your help! I realize I should have previously altered the code that creates the "data.npy" file, so that it has the same number of columns for each row, but that code already takes hours to run and I'd rather avoid that!

2 Answers 2

1

With varying length sub arrays, this is dtype=object array. For most purposes this is the same as a list of these subarrays. So most actions will require iteration.

A variant on your action would be a list comprehension

In [61]: dd=[[nan,nan,nan] if len(i)==1 else i for i in d]

In [62]: dd
Out[62]: [[5, 12, 3], [nan, nan, nan], [10, 13, 9], [nan, nan, nan], [nan, nan, nan]]

Your three target arrays are then columns of:

In [63]: np.array(dd)
Out[63]: 
array([[  5.,  12.,   3.],
       [ nan,  nan,  nan],
       [ 10.,  13.,   9.],
       [ nan,  nan,  nan],
       [ nan,  nan,  nan]])

Another approach is to make an array of that type filled with nan, and then copy over the non-nan values. But that too requires iteration to find the length of the subsarrays.

In [65]: [len(i)>1 for i in d]
Out[65]: [True, False, True, False, False]

np.nan is a float, so a 2d array with nan will be dtype float.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I used your first suggestion and then transposed dd so that depth = dd[0] and so on. Much cleaner-looking than all that appending I was doing.
1

A shorter way using pandas:

import numpy as np
import pandas as pd

data = np.array([[5,12,3], [np.nan], [10,13,9], [np.nan], [np.nan]])
df = pd.DataFrame.from_records(data.tolist())
df.columns = ['depth','upper','lower']

Output:

>>> df
   depth  upper  lower
0    5.0   12.0    3.0
1    NaN    NaN    NaN
2   10.0   13.0    9.0
3    NaN    NaN    NaN
4    NaN    NaN    NaN

You can now address each column to get your desired output

>>> df.depth
0     5.0
1     NaN
2    10.0
3     NaN
4     NaN

If you need lists:

>>> df.depth.tolist()
[5.0, nan, 10.0, nan, nan]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.