0

I tried to model voxels of 3D cylinder with the following code:

import math
import numpy as np

R0 = 500
hz = 1

x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)

xx, yy, zz = np.meshgrid(x, y, z)


def density_f(x, y, z):
    r_xy = math.sqrt(x ** 2 + y ** 2)
    if r_xy <= R0 and -hz <= z <= hz:
        return 1
    else:
        return 0


density = np.vectorize(density_f)(xx, yy, zz)

and it took many minutes to compute.

Equivalent suboptimal Java code runs 10-15 seconds.

How to make Python compute voxels at the same speed? Where to optimize?

3
  • Your bio lists MATLAB. Originally that was a wrapper for Fortran matrix code. To get good speed you had to think in terms of whole arrays. New MATLAB has a jit compiler that lets you 'cheat' and write iterative code. numpy is more like the older MATLAB. To get around that you have use added packages like cython and numba that create custom compiled code. I assume you know enough computer science to know the difference between C, Java, and Python. Commented Oct 27, 2018 at 0:30
  • The question of how do a fast vectorization comes up often. Here's a simpler recent example: stackoverflow.com/questions/53016316/… Commented Oct 27, 2018 at 4:23
  • 1
    Slightly unrelated to your question, but compare square of xy distance with precomputed square of radius - you can save sqrt in each iteration, which should make a big difference. Commented Oct 29, 2018 at 12:00

3 Answers 3

2

Please do not use .vectorize(..), it is not efficient since it will still do the processing at the Python level. .vectorize() should only be used as a last resort if for example the function can not be calculated in "bulk" because its "structure" is too complex.

But you do not need to use .vectorize here, you can implement your function to work over arrays with:

r_xy = np.sqrt(xx ** 2 + yy ** 2)
density = (r_xy <= R0) & (-hz <= zz) & (zz <= hz)

or even a bit faster:

r_xy = xx * xx + yy * yy
density = (r_xy <= R0 * R0) & (-hz <= zz) & (zz <= hz)

This will construct a 2000×2000×20 array of booleans. We can use:

intdens = density.astype(int)

to construct an array of ints.

Printing the array here is quite combersome, but it contains a total of 2'356'047 ones:

>>> density.astype(int).sum()
2356047

Benchmarks: If I run this locally 10 times, I get:

>>> timeit(f, number=10)
18.040479518999973
>>> timeit(f2, number=10)  # f2 is the optimized variant
13.287886952000008

So on average, we calculate this matrix (including casting it to ints) in 1.3-1.8 seconds.

Sign up to request clarification or add additional context in comments.

8 Comments

What about more complex cases with if-s? Can I apply functions elementwise efficiently?
P.S. Why are you using single ampersands?
@Dims: because and in Python evaluates truthiness, and an array has no truthiness. & is the "elementwise" and of two matrices. It can require some "creativity" to write it in such variant. This is something one "masters" over time with experience.
Thanks, but please explain, what to do with more complex functions, where are if-s, exceptions etc, which I can't write as single vectorized expression?
@Dims: there are no ifs, etc. you need to encode this in a logical way. The same for exceptions. It thus requires "mathematical engineering".
|
2

You can also use a compiled version of the function to calculate density. You can use cython or numba for that. I use numba to jit compile the density calculation function in the ans, as it is as easy as putting in a decorator.

Pros :

  • You can write if conditions as you mention in your comments
  • Slightly faster than the numpy version mentioned in the ans by @Willem Van Onsem, as we have to iterate through the boolean array to calculate the sum in density.astype(int).sum().

Cons:

  • Write an ugly three level loop. Looses the beauty of the singlish liner numpy solution.

Code:

import numba as nb
@nb.jit(nopython=True, cache=True)
def calc_density(xx, yy, zz, R0, hz):
    threshold = R0 * R0
    dimensions = xx.shape

    density = 0
    for i in range(dimensions[0]):
        for j in range(dimensions[1]):
            for k in range(dimensions[2]):
                r_xy = xx[i][j][k] ** 2 + yy[i][j][k] ** 2
    
                if(r_xy <= threshold and -hz <= zz[i][j][k] <= hz):
                    density+=1
    return density

Running times:

Willem Van Onsem solution, f2 variant : 1.28s without sum, 2.01 with sum.

Numba solution( calc_density, on second run, to discount the compile time) : 0.48s.

As suggested in the comments, we need not calculate the meshgrid also. We can directly pass the x, y, z to the function. Thus:

@nb.jit(nopython=True, cache=True)
def calc_density2(x, y, z, R0, hz):
    threshold = R0 * R0
    dimensions = len(x), len(y), len(z)

    density = 0
    for i in range(dimensions[0]):
        for j in range(dimensions[1]):
            for k in range(dimensions[2]):
                
                r_xy = x[i] ** 2 + y[j] ** 2
                if(r_xy <= threshold and -hz <= z[k] <= hz):
                    density+=1
    return density

Now, for fair comparison, we also include the time of np.meshgrid in @Willem Van Onsem's ans. Running times:

Willem Van Onsem solution, f2 variant(np.meshgrid time included) : 2.24s

Numba solution( calc_density2, on second run, to discount the compile time) : 0.079s.

5 Comments

1) Don't use global variables! Within Numba every "global" variable is a compile time constant. If you change the global variable and run your function again, Numba won't recognize this changes, which leads to hard to debug errors. 2) A small change (passing x,x,z directly) leads to a factor 10 better performance. 3) Parallelization would give another factor of Number of Cores.
@max9111 can you please explain or example (2) and (3)?
@max9111, thanks. 1) yes, indeed. Slipped my mind. Edited to incorporate 2). Can't seem to get 3). I guess the bookkeeping required for parallelization dwarfs the gains for the specific size.
@Deepak Saini I have posted quite a duplicate answer, apart from parallelization. I think it would be better to add the parallelized aproach to your answer and I delete mine afterwards... BTW: cache=True is not working if paralization is set to True.
@max9111, sure. Will do.
1

This is meant as a lengthy comment on the answer of Deepak Saini.

The main change is to not use the coordinates generated by np.meshgrid which contains unecessary repetitions. This isn't recommandable if you can avoid it (both in terms of memory usage and performance)

Code

import numba as nb
import numpy as np

@nb.jit(nopython=True,parallel=True)
def calc_density_2(x, y, z,R0,hz):
    threshold = R0 * R0

    density = 0
    for i in nb.prange(y.shape[0]):
        for j in range(x.shape[0]):
            r_xy = x[j] ** 2 + y[i] ** 2
            for k in range(z.shape[0]):
                if(r_xy <= threshold and -hz <= z[k] <= hz):
                    density+=1

    return density

Timings

R0 = 500
hz = 1

x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)

xx, yy, zz = np.meshgrid(x, y, z)

#after the first call (compilation overhead)
#calc_density_2          9.7 ms
#calc_density_2 parallel 3.9 ms
#@Deepak Saini           115 ms

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.