Revision 0e5da2e6-d911-4414-99fa-f113b3f0ec45 - Code Review Stack Exchange

Loops over large arrays are not really a good idea in Python. This is why your original list comprehension is not terribly fast.

Your numpy version is loop free, but as far as I know, `np.repeat` actually makes copies of your data, which again, is really inefficient. An alternative would be to use `np.tile`, which maybe does not need to copy the data. But we don't really need to bother since numpy has a great feature called [*broadcasting*][1], which often makes `np.repeat`/`np.tile` completely unneccessary. Broadcasting basically does `np.repeat/tile` automatically.

To the performance, I created a more abstract version of your list comprehension:

    def get_valid_op(arr, lowers, uppers):
        return np.asarray([any((val >= lowers) & (val < uppers)) for val in arr])

and also a broadcasting version

    def get_valid_arr(arr, lowers, uppers):
        valid = np.logical_and(arr.reshape(1, -1) >= lowers.reshape(-1, 1), arr.reshape(1, -1) < uppers.reshape(-1, 1))
        return valid.any(axis=0)

The second one is virtually the exact same algorithm as your repeat/reshape code.

With some test data modeled after your description above

    arr = np.linspace(0, 1000, 70000)
    starts = np.linspace(0, 150, 151) * 400
    ends = starts + np.random.randint(0, 200, region_starts.shape)  # I assumed non-overlapping regions here

we can first `assert all(get_valid_op(arr, starts, ends) == get_valid_arr(arr, starts, ends))` and then time:

```none
%timeit -n 10 get_valid_op(arr, starts, ends)
511 ms ± 5.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 get_valid_arr(arr, starts, ends)
37.8 ms ± 3.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

An order of magnitude faster. Not bad to begin with ;-)

Since working with large arrays (`valid` has a shape of `(150, 70000)` before reduction) also has a cost, I then took a step back and returned to loopy-list (just a little bit). 

    def get_valid_loop(arr, lowers, uppers):
        valid = np.zeros(arr.shape, dtype=bool)
        for start, end in zip(lowers, uppers):
            valid = np.logical_or(valid, np.logical_and(start <= arr, arr < end))
        return valid

In contrast to your list comprehension, this version now only iterates over the shorter region limit vectors, which means about two orders of magnitude fewer iterations.

We can then again `assert all(get_valid_op(arr, starts, ends) == get_valid_loop(arr, starts, ends))` and time it:

```none
%timeit -n 10 get_valid_loop(arr, starts, ends)
18.1 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

As the results show, this version is even faster on my "synthetic" benchmark inputs.

In the end you will have to check the versions in your application and see which one performs best.

  [1]: https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html