5

I've been given a multidimensional numpy array, x that looks like this:

array([ array([  398.24475098,  -196.1497345 ,  -110.79341125, ..., -1937.22399902,
       -6158.89355469,  1742.84399414], dtype=float32),
       array([   32.27750397,  -171.73371887,  -342.6328125 , ..., -4727.4296875 ,
       -4727.4296875 , -2545.10375977], dtype=float32),
       array([  785.83660889,  -234.88890076,   140.49914551, ..., -7982.19482422,
       -2127.640625  , -1434.77160645], dtype=float32),
       ...,
       array([   181.93313599,   -146.41413879,   -416.02978516, ...,
        -4517.796875  ,  10491.84570312,  -6604.39550781], dtype=float32),
       array([ -1.37602341e+02,   1.71733719e+02,   7.13068867e+00, ...,
         8.60104688e+03,   1.39115127e+04,   3.31622314e+03], dtype=float32),
       array([   453.17272949,    152.49285889,    260.41452026, ...,
        19061.60742188,  11232.8046875 ,   7312.13964844], dtype=float32)], dtype=object)

I'm trying to access each column (specifically I'm trying to take the standard deviation of each column). I found this answer, and I tried,

>>> x[:,0]

But this returned an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array

Is it possible to convert this structured array into a simple 2D numpy array to access the columns? Or is there a good way to access these columns directly?

Thanks!

Edit

Some more information on this array:

>>> x.shape
(8685,)
>>> x[0].shape  # Same for x[1], x[2], ...
(3524,)

If it's any help, I used the tree2array function in the root_numpy package to produce this array.

8
  • What is your array's shape? Have you tried np.ascontiguousarray(x)? Commented Jul 28, 2017 at 9:22
  • Have all your subarrays the same length? If not this might be a problem Commented Jul 28, 2017 at 9:24
  • Do you interpret the inner arrays of as the rows or columns of a matrix? Commented Jul 28, 2017 at 9:27
  • What happens if you change the dtype of the outer array by arr.astype(np.float32)? Commented Jul 28, 2017 at 10:08
  • @obachtos Each inner array is a row in the matrix. Doing x.astype(np.float32) gives ValueError: setting an array element with a sequence. Commented Jul 28, 2017 at 10:14

4 Answers 4

1

I was able to get things to work with the help of this answer:

How do I convert an array of arrays into a multi-dimensional array in Python?.

>>> y = np.stack(x)
>>> y
array([[  3.98244751e+02,  -1.96149734e+02,  -1.10793411e+02, ...,
         -1.93722400e+03,  -6.15889355e+03,   1.74284399e+03],
       [  3.22775040e+01,  -1.71733719e+02,  -3.42632812e+02, ...,
         -4.72742969e+03,  -4.72742969e+03,  -2.54510376e+03],
       [  7.85836609e+02,  -2.34888901e+02,   1.40499146e+02, ...,
         -7.98219482e+03,  -2.12764062e+03,  -1.43477161e+03],
       ...,
       [  1.81933136e+02,  -1.46414139e+02,  -4.16029785e+02, ...,
         -4.51779688e+03,   1.04918457e+04,  -6.60439551e+03],
       [ -1.37602341e+02,   1.71733719e+02,   7.13068867e+00, ...,
          8.60104688e+03,   1.39115127e+04,   3.31622314e+03],
       [  4.53172729e+02,   1.52492859e+02,   2.60414520e+02, ...,
          1.90616074e+04,   1.12328047e+04,   7.31213965e+03]], dtype=float32)
>>> y[:,0]
array([ 398.24475098,   32.27750397,  785.83660889, ...,  181.93313599,
       -137.6023407 ,  453.17272949], dtype=float32)
Sign up to request clarification or add additional context in comments.

Comments

0

not sure how you are doing it,but it seems to work for me straight from the prompt

a=np.zeros((2,6),dtype=np.float32)
>>> a
array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)
>>> a[:,2]=1
>>> a
array([[ 0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.]], dtype=float32)

4 Comments

But he doesn't have a 2D array but nested arrays, that's why id doesn't work.
It should work with nested arrays aswell, as long as all the sub arrays have the same lenght
Apparently they aren't. Otherwise numpy would automatically interpret them as a 2D array.
See my answer in stackoverflow.com/q/45361548 for a way of constructing a 1d array of arrays.
0

"Or is there a good way to access these columns directly?" - Yes, there is!

Suppose you want to get the ith column of this array without converting it to a 2D array first(which, however, is the cleaner approach).

>>> col = np.array([row[i] for row in x])

However, I would suggest to first convert it to a 2D array like so:

columns = x[0].shape[0] # Note: Number of elements in each array should be the same
rows = len(x)
x_flat = x.flatten()
x_2d = x_flat.reshape((rows, columns))
col = x_2d[:, i]

3 Comments

Thanks! Your first method works, although it's a bit slow on my machine (about 20 s to process the full 8685 x 3524 array). I tried reshaping the array but I get this error: ValueError: cannot reshape array of size 8685 into shape (8685,3524)
Could you please check if it is working now after adding x_flat?
No, x.flatten() does not flatten the array (i.e. it returns the original array of arrays)
0

A solution might be to transform the structured array to a list and then create a standard array from that:

np.array(x.tolist())

See this question for reference.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.