This question is quite close to my heart as I have been doing something like this for almost 2 years and always wondered if there is a vectorized way of modifying large array/dataframe when row i depends upon row i-1, i.e., when recursion sounds mandatory. I am very keen to hear if there are either clever algorithms or clever tools (cython, numba, get rid of redundant operations, etc.) to optimize the runtime.
Problem:
I have 3 big numpy arrays: x, y and z of shape (1million by 500), (1million by 500) and (1million by 1). Clip/winsorize each element in any given row i of x based on whether |(x[i] - x[i-1]) * z[i] / y[i]| > thresh. I am doing this in the following way which is taking extremely long time for my simulations to run (esp when this step repeats thousands of time to tune the hyperparameters):
t = x.copy()
t[0] = np.clip(t[0] * z[0]/ y[0], -1 * thresh, thresh) * y[0] / z[0]
for i in range(1, t.shape[0]):
t[i] = np.clip((t[i] - t[i-1]) * z[i] / y[i], -1* thresh, thresh) * y[i] / z[i] + t[i-1]
Sample input:
import numpy as np
import random
x = np.random.rand(1000000, 500)
y = np.random.rand(1000000, 500)
z = np.random.rand(1000000, 1)
thresh = 0.7
Edit: Modified to remove append as suggested by @Mad Physicist and redundant if-else as suggested by @Pedro Maia