Numpy Indexing Behavior

Question

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.

So, as a test case, let's say I have this:

A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))

If I print their shapes, I get:

print A.shape returns (100L, 10L)
print B.shape returns (100L,)

When I try to index into A using B naively (incorrectly)

Test1 = A[:,B]
print Test1.shape returns (100L, 100L)

but if I do

Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)

which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?

Possible duplicate of Changing numpy array with array of indices — shx2
– shx2, Commented Jan 20, 2016 at 6:56
Try the same but with np.arange(A.shape[0])[:,None] as the 1st index. — hpaulj
– hpaulj, Commented Jan 20, 2016 at 11:12

hpaulj · Accepted Answer · 2016-01-20 20:41:10Z

Let's look at a simple array:

In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.

In [656]: X[:,[3,2,1]]
Out[656]: 
array([[ 3,  2,  1],
       [ 7,  6,  5],
       [11, 10,  9]])

If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:

In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])

If instead I gave it a column vector to index rows, I get the same thing as with the slice:

In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]: 
array([[ 3,  2,  1],
       [ 7,  6,  5],
       [11, 10,  9]])

This amounts to picking 9 individual values, as generated by broadcasting:

In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]: 
[array([[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]]), 
 array([[3, 2, 1],
        [3, 2, 1],
        [3, 2, 1]])]

numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Collectives™ on Stack Overflow

Numpy Indexing Behavior

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related