1

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.

So, as a test case, let's say I have this:

A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))

If I print their shapes, I get:

print A.shape returns (100L, 10L)
print B.shape returns (100L,)

When I try to index into A using B naively (incorrectly)

Test1 = A[:,B]
print Test1.shape returns (100L, 100L)

but if I do

Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)

which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?

2
  • Possible duplicate of Changing numpy array with array of indices Commented Jan 20, 2016 at 6:56
  • Try the same but with np.arange(A.shape[0])[:,None] as the 1st index. Commented Jan 20, 2016 at 11:12

1 Answer 1

1

Let's look at a simple array:

In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.

In [656]: X[:,[3,2,1]]
Out[656]: 
array([[ 3,  2,  1],
       [ 7,  6,  5],
       [11, 10,  9]])

If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:

In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])

If instead I gave it a column vector to index rows, I get the same thing as with the slice:

In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]: 
array([[ 3,  2,  1],
       [ 7,  6,  5],
       [11, 10,  9]])

This amounts to picking 9 individual values, as generated by broadcasting:

In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]: 
[array([[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]]), 
 array([[3, 2, 1],
        [3, 2, 1],
        [3, 2, 1]])]

numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.