0

Im trying to run this kind of loop (its simplified in this example) that generates and adds up random consumption´s for 1000 clients, which takes approximately 1h30h.

import numpy as np

rand_array = np.random.rand(35000)
total_consumption = np.zeros(35000)

for t in range(0,1000):
   consumption = np.zeros(35000)
   consumption[0] = 0.5
   rand_array = np.random.rand(35000)

   for i in range(1,35000):
      consumption[i] = rand_array[i] * consumption[i-1]

   total_consumption = total_consumption + consumption

Is there a way I can make this faster and more efficient? I tried to use list comprehension to no avail

11
  • Have you tried sum()? Care with numpy.sum() as it does not always return overflow errors if your type is too small. Commented Sep 7, 2021 at 19:25
  • Creating brand new arrays every pass through can be prohibitively time consuming. Why do you need a rand array anyway? Can't you just generate a random number for each multiplication? For that matter, why is there a consumption array when you only need the previous consumption value? Commented Sep 7, 2021 at 19:26
  • I edited the code so it run without an overflow or syntactic error. Please check the modifications are correct. The result is a zero-based array because it quickly converge to 0 due to the product by values between 0 and 1... Commented Sep 7, 2021 at 19:41
  • @RufusVs This a very simplified example of my code, the original uses a random distribution and complex algorithm built in excel that Im now trying to port to python. Its simplified so it easier to understand. Commented Sep 7, 2021 at 19:44
  • 2
    If your algorithm requires applying 35,000 values to 1000 customers, I can't see a shortcut. Any savings would be from finding a better algorithm, or implementing the bottleneck in a faster language that you can call from Python. Commented Sep 7, 2021 at 20:04

2 Answers 2

2

You can use np.cumprod to vectorize the computation and make it much faster. Here is the resulting code:

total_consumption = np.zeros(35000)

for t in range(0,1000):
    rand_array = np.random.rand(35000)
    rand_array[0] = 0.5 # Needed for the cumprod
    consumption = np.cumprod(rand_array)
    total_consumption += consumption

This code takes 267 milliseconds on my machine while the original one takes 11.8 seconds. Thus, it is about 44 time faster.

Sign up to request clarification or add additional context in comments.

Comments

2

I had a try at doing the middle part with numba:

import numba
from numba import jit

@jit(nopython=True)
def speedy(consumption, rand_array):
    for i in range(35000):
        consumption[i] = rand_array[i] * consumption[i-1]
    return consumption

rand_array = np.random.rand(35000)
total_consumption = np.zeros(35000)

for t in range(0,1000):
    consumption = np.zeros(35000)
    consumption[0] = 0.5
    rand_array = np.random.rand(35000)

    consumption = speedy(consumption, rand_array)
    total_consumption = total_consumption + consumption

The time was 259 ms versus 9.6 seconds for your code. I guess you could do more in numba too if you wanted to try.

2 Comments

This solution seems interesting but in my real algorithm im using consumption[i] = scipy.stats.beta.ppf(rand_array[r], 5 * d[r-1], 5 * (1 - d[r-1]). Im new to python and numba is there a way to integrate scipy.stats.beta.ppf function in numba?
I'm unfamiliar with that function. If you can find the source, you could try putting it into my numba function... maybe. Not sure.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.