0

I have a for loop doing some operation on the elements of an array. There are 1e5 elements in the array

import numpy as np
A=np.array([1,2,3,4..........100000)]
for i in range(0,len(A)):
       A[i]=(A[i]*2+A[i]*4)**(1/3)

I want to obtain parallelisation in the above code so that each execution of the for loop goes to a different core to make the code execution faster. I have a workstation with 48 cores. How to achieve this parallel processing in python? Please help.

1
  • @AlexanderL.Hayes: I assume they were hand waving away the precise values, and typoed the order of the close paren/bracket. You could achieve the same result with np.arange(1, 100000+1) (much more quickly) if you like. Commented Jan 13, 2021 at 19:40

2 Answers 2

4

Don't bother parallelizing just yet. Right now, you're taking no advantage of numpy vectorization; you may as well be using Python list (or maybe array.array) for all the benefit numpy is giving you.

Actually use the vectorization features, and the overhead should drop by several orders of magnitude:

import numpy as np
A = np.array([1,2,3,4..........100000])  # If this is actually the values you want, use np.arange(1, 100000+1) to speed it up
A = (A * 6) ** (1 / 3)

# If the result should truncate back to int64, not convert to doubles, cast back at the end
A = A.astype(np.int64)

(A * 6) ** (1 / 3) does the same work as the for loop did, but much faster (you could match the original code more closely with A = (A * 2 + A * 4) ** (1/3), but multiplying by 2 and 4 separately and adding them together is pointless when you could just multiply by 6 directly). The final (optional, depending on intent) line gets exact equivalent behavior of the original loop by truncating back to the original integer dtype.

Comparing performance with ipython %%timeit magic for a microbenchmark:

In [2]: %%timeit
   ...: A = np.arange(1, 100000+1)
   ...: for i in range(len(A)):
   ...:     A[i] = (A[i]*2 + A[i]*4) ** (1/3)
   ...:
427 ms ± 6.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %%timeit
   ...: A = np.arange(1, 100000+1)
   ...: A = (A * 6) ** (1/3)
   ...:
2.72 ms ± 51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The vectorized code takes about 0.6% of the time taken by the naive loop; merely parallelizing the naive loop would never come close to achieving that sort of speedup. Adding the .astype(np.int64) cast only increases runtime by about 6%, still a trivial fraction of what the original for loop required.

Sign up to request clarification or add additional context in comments.

2 Comments

Will using A = np.power(A*6, 1/3) have any improvement?
@anurag: You're welcome to try, but it shouldn't make a difference; the numpy array type almost certainly overloads __pow__ (the special method invoked by **) to do the same thing, so unless you need to tweak behaviors with the optional arguments to numpy.power, there's no real benefit (it would likely be slightly slower by using generic function call dispatch, rather than dedicated syntax-based dispatch, but the difference would be meaningless for moderate sized arrays). Local tests showed them roughly equivalent.
0

Let numpy do the hard work.

A = (A*2+A*4)**(1/3)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.