Python - numpy : 'dimension dependent indexing'

Question

I'm seeking for an elegant (and fast) solution to the following problem, simplification of a heavier real situation. The answer might be elsewhere than numpy, I searched and searched...

So, hypothetically, I have:

a = np.array([[2,7],
              [3,6],
              [2,8]])

And let's take a fake data set:

b = np.random.random((3,10))

Rows in array a represent lower and higher indices of the subset of interest in every row of b: "from the first row in b, I'm interested in the subset [2:7], from the second row I'm interested in the subset [3:6], and from the third and last row the subset [2:8]."

My idea for now is to create a kind of mask array c

c = np.array([0,0,1,1,1,1,1,1,0,0],
              [0,0,0,1,1,1,1,0,0,0],
              [0,0,1,1,1,1,1,1,1,0]])

And then I just work on

d = b*c

and elements I'm not interested in are now 0.

How would you produce c using indices in a?
Would you have a better nice idea?

Proper masked arrays, np.ix_, twisted np.einsum, I couldn't find anything for this purpose. Of course the whole point is to avoid looping, at list in the visible part of my script.. But is it even avoidable in the end?

Thanks a lot!

Divakar · Accepted Answer · 2015-11-19 13:05:20Z

4

You can create the mask with broadcasting -

n = b.shape[1]
mask = (np.arange(n) >= a[:,None,0]) & (np.arange(n) <= a[:,None,1])
d = mask*b

Sample run -

In [252]: a
Out[252]: 
array([[2, 4],
       [3, 6],
       [2, 3]])

In [253]: b
Out[253]: 
array([[908, 867, 917, 649, 758, 950, 692],
       [715, 745, 797, 595, 377, 421, 712],
       [213, 143, 169, 825, 858, 780, 176]])

In [254]: n = b.shape[1]
     ...: mask = (np.arange(n) >= a[:,None,0]) & (np.arange(n) <= a[:,None,1])
     ...: 

In [255]: mask
Out[255]: 
array([[False, False,  True,  True,  True, False, False],
       [False, False, False,  True,  True,  True,  True],
       [False, False,  True,  True, False, False, False]], dtype=bool)

In [256]: mask*b
Out[256]: 
array([[  0,   0, 917, 649, 758,   0,   0],
       [  0,   0,   0, 595, 377, 421, 712],
       [  0,   0, 169, 825,   0,   0,   0]])

edited Nov 19, 2015 at 13:05

answered Nov 19, 2015 at 12:55

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Etienne Over a year ago

"..so that looping occurs in C instead of Python." . Broadcasting! Always heard about it, always thought "I kind of use it, right?". That's the magic I was looking for, thanks Divakar. Most of the time you need your own example solved to really understand something new.

Divakar Over a year ago

@Etienne Yeah most of the time askers post simple data and I change those at my end to test out all possible scenarios :) Yes, NumPy gets the optimizations by doing everything in one-go rather than involving itself in loops and under the hood these vectorized operations are done in C.

Collectives™ on Stack Overflow

Python - numpy : 'dimension dependent indexing'

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related