How to fill 3D array in numpy with functional values at the same speed as Java?

Question

I tried to model voxels of 3D cylinder with the following code:

import math
import numpy as np

R0 = 500
hz = 1

x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)

xx, yy, zz = np.meshgrid(x, y, z)


def density_f(x, y, z):
    r_xy = math.sqrt(x ** 2 + y ** 2)
    if r_xy <= R0 and -hz <= z <= hz:
        return 1
    else:
        return 0


density = np.vectorize(density_f)(xx, yy, zz)

and it took many minutes to compute.

Equivalent suboptimal Java code runs 10-15 seconds.

How to make Python compute voxels at the same speed? Where to optimize?

Your bio lists MATLAB. Originally that was a wrapper for Fortran matrix code. To get good speed you had to think in terms of whole arrays. New MATLAB has a jit compiler that lets you 'cheat' and write iterative code. numpy is more like the older MATLAB. To get around that you have use added packages like cython and numba that create custom compiled code. I assume you know enough computer science to know the difference between C, Java, and Python. — hpaulj
– hpaulj, Commented Oct 27, 2018 at 0:30
The question of how do a fast vectorization comes up often. Here's a simpler recent example: stackoverflow.com/questions/53016316/… — hpaulj
– hpaulj, Commented Oct 27, 2018 at 4:23
Slightly unrelated to your question, but compare square of xy distance with precomputed square of radius - you can save sqrt in each iteration, which should make a big difference. — Artur Biesiadowski
– Artur Biesiadowski, Commented Oct 29, 2018 at 12:00

willeM_ Van Onsem · Accepted Answer · 2018-10-26 22:19:32Z

2

Please do not use .vectorize(..), it is not efficient since it will still do the processing at the Python level. .vectorize() should only be used as a last resort if for example the function can not be calculated in "bulk" because its "structure" is too complex.

But you do not need to use .vectorize here, you can implement your function to work over arrays with:

r_xy = np.sqrt(xx ** 2 + yy ** 2)
density = (r_xy <= R0) & (-hz <= zz) & (zz <= hz)

or even a bit faster:

r_xy = xx * xx + yy * yy
density = (r_xy <= R0 * R0) & (-hz <= zz) & (zz <= hz)

This will construct a 2000×2000×20 array of booleans. We can use:

intdens = density.astype(int)

to construct an array of ints.

Printing the array here is quite combersome, but it contains a total of 2'356'047 ones:

>>> density.astype(int).sum()
2356047

Benchmarks: If I run this locally 10 times, I get:

>>> timeit(f, number=10)
18.040479518999973
>>> timeit(f2, number=10)  # f2 is the optimized variant
13.287886952000008

So on average, we calculate this matrix (including casting it to ints) in 1.3-1.8 seconds.

edited Oct 26, 2018 at 22:19

answered Oct 26, 2018 at 21:51

willeM_ Van Onsem

482k33 gold badges483 silver badges623 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Dims Over a year ago

What about more complex cases with if-s? Can I apply functions elementwise efficiently?

Dims Over a year ago

P.S. Why are you using single ampersands?

willeM_ Van Onsem Over a year ago

@Dims: because and in Python evaluates truthiness, and an array has no truthiness. & is the "elementwise" and of two matrices. It can require some "creativity" to write it in such variant. This is something one "masters" over time with experience.

Dims Over a year ago

Thanks, but please explain, what to do with more complex functions, where are if-s, exceptions etc, which I can't write as single vectorized expression?

willeM_ Van Onsem Over a year ago

@Dims: there are no ifs, etc. you need to encode this in a logical way. The same for exceptions. It thus requires "mathematical engineering".

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

You can also use a compiled version of the function to calculate density. You can use cython or numba for that. I use numba to jit compile the density calculation function in the ans, as it is as easy as putting in a decorator.

Pros :

You can write if conditions as you mention in your comments
Slightly faster than the numpy version mentioned in the ans by @Willem Van Onsem, as we have to iterate through the boolean array to calculate the sum in density.astype(int).sum().

Cons:

Write an ugly three level loop. Looses the beauty of the singlish liner numpy solution.

Code:

import numba as nb
@nb.jit(nopython=True, cache=True)
def calc_density(xx, yy, zz, R0, hz):
    threshold = R0 * R0
    dimensions = xx.shape

    density = 0
    for i in range(dimensions[0]):
        for j in range(dimensions[1]):
            for k in range(dimensions[2]):
                r_xy = xx[i][j][k] ** 2 + yy[i][j][k] ** 2
    
                if(r_xy <= threshold and -hz <= zz[i][j][k] <= hz):
                    density+=1
    return density

Running times:

Willem Van Onsem solution, f2 variant : 1.28s without sum, 2.01 with sum.

Numba solution( calc_density, on second run, to discount the compile time) : 0.48s.

As suggested in the comments, we need not calculate the meshgrid also. We can directly pass the x, y, z to the function. Thus:

@nb.jit(nopython=True, cache=True)
def calc_density2(x, y, z, R0, hz):
    threshold = R0 * R0
    dimensions = len(x), len(y), len(z)

    density = 0
    for i in range(dimensions[0]):
        for j in range(dimensions[1]):
            for k in range(dimensions[2]):
                
                r_xy = x[i] ** 2 + y[j] ** 2
                if(r_xy <= threshold and -hz <= z[k] <= hz):
                    density+=1
    return density

Now, for fair comparison, we also include the time of np.meshgrid in @Willem Van Onsem's ans. Running times:

Willem Van Onsem solution, f2 variant(np.meshgrid time included) : 2.24s

Numba solution( calc_density2, on second run, to discount the compile time) : 0.079s.

1) Don't use global variables! Within Numba every "global" variable is a compile time constant. If you change the global variable and run your function again, Numba won't recognize this changes, which leads to hard to debug errors. 2) A small change (passing x,x,z directly) leads to a factor 10 better performance. 3) Parallelization would give another factor of Number of Cores.
@max9111, thanks. 1) yes, indeed. Slipped my mind. Edited to incorporate 2). Can't seem to get 3). I guess the bookkeeping required for parallelization dwarfs the gains for the specific size.
@Deepak Saini I have posted quite a duplicate answer, apart from parallelization. I think it would be better to add the parallelized aproach to your answer and I delete mine afterwards... BTW: cache=True is not working if paralization is set to True.

max9111 · Accepted Answer · 2018-10-29 20:04:40Z

This is meant as a lengthy comment on the answer of Deepak Saini.

The main change is to not use the coordinates generated by np.meshgrid which contains unecessary repetitions. This isn't recommandable if you can avoid it (both in terms of memory usage and performance)

Code

import numba as nb
import numpy as np

@nb.jit(nopython=True,parallel=True)
def calc_density_2(x, y, z,R0,hz):
    threshold = R0 * R0

    density = 0
    for i in nb.prange(y.shape[0]):
        for j in range(x.shape[0]):
            r_xy = x[j] ** 2 + y[i] ** 2
            for k in range(z.shape[0]):
                if(r_xy <= threshold and -hz <= z[k] <= hz):
                    density+=1

    return density

Timings

R0 = 500
hz = 1

x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)

xx, yy, zz = np.meshgrid(x, y, z)

#after the first call (compilation overhead)
#calc_density_2          9.7 ms
#calc_density_2 parallel 3.9 ms
#@Deepak Saini           115 ms

Collectives™ on Stack Overflow

How to fill 3D array in numpy with functional values at the same speed as Java?

3 Answers 3

8 Comments

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

5 Comments

Comments

Linked

Related