1

This question is quite close to my heart as I have been doing something like this for almost 2 years and always wondered if there is a vectorized way of modifying large array/dataframe when row i depends upon row i-1, i.e., when recursion sounds mandatory. I am very keen to hear if there are either clever algorithms or clever tools (cython, numba, get rid of redundant operations, etc.) to optimize the runtime.

Problem:
I have 3 big numpy arrays: x, y and z of shape (1million by 500), (1million by 500) and (1million by 1). Clip/winsorize each element in any given row i of x based on whether |(x[i] - x[i-1]) * z[i] / y[i]| > thresh. I am doing this in the following way which is taking extremely long time for my simulations to run (esp when this step repeats thousands of time to tune the hyperparameters):

t = x.copy()
t[0] = np.clip(t[0] * z[0]/ y[0], -1 * thresh, thresh) * y[0] / z[0]
for i in range(1, t.shape[0]):
    t[i] = np.clip((t[i] - t[i-1]) * z[i] / y[i], -1* thresh, thresh) * y[i] / z[i] + t[i-1]

Sample input:

import numpy as np
import random
x = np.random.rand(1000000, 500)
y = np.random.rand(1000000, 500)
z = np.random.rand(1000000, 1)
thresh = 0.7

Edit: Modified to remove append as suggested by @Mad Physicist and redundant if-else as suggested by @Pedro Maia

12
  • Why do you have an if statement that is executed 1 mi times? if you want to do it only once move outside the loop Commented Oct 31, 2021 at 15:02
  • @PedroMaia fair point. I have modified it now. Commented Oct 31, 2021 at 15:04
  • Don't use append for one thing. It reallocates the entire array every time Commented Oct 31, 2021 at 15:26
  • Consider using cython or numba for something like this. Commented Oct 31, 2021 at 15:26
  • @MadPhysicist thanks - I got rid of append. Could you show me how to use numba and cython in this case. (my simulation is running on spark so assuming that numba/cython wouldn't cause issues there). Thanks a lot Commented Oct 31, 2021 at 15:56

1 Answer 1

1

I got 3-4x improvement by using numba, another 2x (in total 6-8x) by caching compiled function.

I had to decrease size of x, y, and z due to small RAM on my PC.

import numba as nb
import numpy as np

@nb.jit(nopython=True, cache=True)
def bottleneck_func3(t, thresh, y, z):
    t[0] = np.clip(t[0] * z[0] / y[0], -1 * thresh, thresh) * y[0] / z[0]
    for i in range(1, t.shape[0]):
        t[i] = np.clip((t[i] - t[i - 1]) * z[i] / y[i], -1 * thresh, thresh) * y[i] / z[i] + t[i - 1]

x = np.random.rand(300000, 500)
y = np.random.rand(300000, 500)
z = np.random.rand(300000, 1)
thresh = 0.7

t = x.copy()
bottleneck_func3(t, thresh, y, z)

Note that cached function is recompiled each time you modify the file, even parts that have nothing to do with cached function.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.