1

I'm seeking for an elegant (and fast) solution to the following problem, simplification of a heavier real situation. The answer might be elsewhere than numpy, I searched and searched...

So, hypothetically, I have:

a = np.array([[2,7],
              [3,6],
              [2,8]])

And let's take a fake data set:

b = np.random.random((3,10))

Rows in array a represent lower and higher indices of the subset of interest in every row of b: "from the first row in b, I'm interested in the subset [2:7], from the second row I'm interested in the subset [3:6], and from the third and last row the subset [2:8]."

My idea for now is to create a kind of mask array c

c = np.array([0,0,1,1,1,1,1,1,0,0],
              [0,0,0,1,1,1,1,0,0,0],
              [0,0,1,1,1,1,1,1,1,0]])

And then I just work on

d = b*c

and elements I'm not interested in are now 0.

  • How would you produce c using indices in a?
  • Would you have a better nice idea?

Proper masked arrays, np.ix_, twisted np.einsum, I couldn't find anything for this purpose. Of course the whole point is to avoid looping, at list in the visible part of my script.. But is it even avoidable in the end?

Thanks a lot!

1 Answer 1

4

You can create the mask with broadcasting -

n = b.shape[1]
mask = (np.arange(n) >= a[:,None,0]) & (np.arange(n) <= a[:,None,1])
d = mask*b

Sample run -

In [252]: a
Out[252]: 
array([[2, 4],
       [3, 6],
       [2, 3]])

In [253]: b
Out[253]: 
array([[908, 867, 917, 649, 758, 950, 692],
       [715, 745, 797, 595, 377, 421, 712],
       [213, 143, 169, 825, 858, 780, 176]])

In [254]: n = b.shape[1]
     ...: mask = (np.arange(n) >= a[:,None,0]) & (np.arange(n) <= a[:,None,1])
     ...: 

In [255]: mask
Out[255]: 
array([[False, False,  True,  True,  True, False, False],
       [False, False, False,  True,  True,  True,  True],
       [False, False,  True,  True, False, False, False]], dtype=bool)

In [256]: mask*b
Out[256]: 
array([[  0,   0, 917, 649, 758,   0,   0],
       [  0,   0,   0, 595, 377, 421, 712],
       [  0,   0, 169, 825,   0,   0,   0]])
Sign up to request clarification or add additional context in comments.

2 Comments

"..so that looping occurs in C instead of Python." . Broadcasting! Always heard about it, always thought "I kind of use it, right?". That's the magic I was looking for, thanks Divakar. Most of the time you need your own example solved to really understand something new.
@Etienne Yeah most of the time askers post simple data and I change those at my end to test out all possible scenarios :) Yes, NumPy gets the optimizations by doing everything in one-go rather than involving itself in loops and under the hood these vectorized operations are done in C.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.