Revisions to Optimize Performance of Region Checking in List Comprehension

edited body

Source Link

edited Nov 14, 2019 at 9:39

7.4k
2
24
47

Since working with large arrays (valid has a shape of (150, 70000) before reduction) also has a cost, I then took a step back and returned to loopy-listland (just a little bit).

Since working with large arrays (valid has a shape of (150, 70000) before reduction) also has a cost, I then took a step back and returned to loopy-list (just a little bit).

Since working with large arrays (valid has a shape of (150, 70000) before reduction) also has a cost, I then took a step back and returned to loopy-land (just a little bit).

added 9 characters in body

Source Link

edited Nov 13, 2019 at 23:59

AlexV

7.4k
2
24
47

To evaluate the performance, I created a more abstract version of your list comprehension:

Source Link

answered Nov 13, 2019 at 23:42

AlexV

7.4k
2
24
47

Loops over large arrays are not really a good idea in Python. This is why your original list comprehension is not terribly fast.

Your numpy version is loop free, but as far as I know, np.repeat actually makes copies of your data, which again, is really inefficient. An alternative would be to use np.tile, which maybe does not need to copy the data. But we don't really need to bother since numpy has a great feature called broadcasting, which often makes np.repeat/np.tile completely unneccessary. Broadcasting basically does np.repeat/tile automatically.

To the performance, I created a more abstract version of your list comprehension:

def get_valid_op(arr, lowers, uppers):
    return np.asarray([any((val >= lowers) & (val < uppers)) for val in arr])

and also a broadcasting version

def get_valid_arr(arr, lowers, uppers):
    valid = np.logical_and(arr.reshape(1, -1) >= lowers.reshape(-1, 1), arr.reshape(1, -1) < uppers.reshape(-1, 1))
    return valid.any(axis=0)

The second one is virtually the exact same algorithm as your repeat/reshape code.

With some test data modeled after your description above

arr = np.linspace(0, 1000, 70000)
starts = np.linspace(0, 150, 151) * 400
ends = starts + np.random.randint(0, 200, region_starts.shape)  # I assumed non-overlapping regions here

we can first assert all(get_valid_op(arr, starts, ends) == get_valid_arr(arr, starts, ends)) and then time:

%timeit -n 10 get_valid_op(arr, starts, ends)
511 ms ± 5.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 get_valid_arr(arr, starts, ends)
37.8 ms ± 3.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

An order of magnitude faster. Not bad to begin with ;-)

Since working with large arrays (valid has a shape of (150, 70000) before reduction) also has a cost, I then took a step back and returned to loopy-list (just a little bit).

def get_valid_loop(arr, lowers, uppers):
    valid = np.zeros(arr.shape, dtype=bool)
    for start, end in zip(lowers, uppers):
        valid = np.logical_or(valid, np.logical_and(start <= arr, arr < end))
    return valid

In contrast to your list comprehension, this version now only iterates over the shorter region limit vectors, which means about two orders of magnitude fewer iterations.

We can then again assert all(get_valid_op(arr, starts, ends) == get_valid_loop(arr, starts, ends)) and time it:

%timeit -n 10 get_valid_loop(arr, starts, ends)
18.1 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

As the results show, this version is even faster on my "synthetic" benchmark inputs.

In the end you will have to check the versions in your application and see which one performs best.

Stack Exchange Network

Return to Answer