6

Consider the following code using numpy arrays which is very slow :

# Intersection of an octree and a trajectory
def intersection(octree, trajectory):
    # Initialize numpy arrays
    ox = octree.get("x")
    oy = octree.get("y")
    oz = octree.get("z")
    oe = octree.get("extent")/2
    tx = trajectory.get("x")
    ty = trajectory.get("y")
    tz = trajectory.get("z")
    result = np.zeros(np.size(ox))
    # Loop over elements
    for i in range(0, np.size(tx)):
        for j in range(0, np.size(ox)):
            if (tx[i] > ox[j]-oe[j] and 
                tx[i] < ox[j]+oe[j] and 
                ty[i] > oy[j]-oe[j] and 
                ty[i] < oy[j]+oe[j] and 
                tz[i] > oz[j]-oe[j] and 
                tz[i] < oz[j]+oe[j]):
                result[j] += 1
    # Finalize
    return result

How to rewrite the function to speed up the calculation ? (np.size(tx) == 10000 and np.size(ox) == 100000)

4
  • Do you also consider using OpenCL? Commented Jun 5, 2014 at 23:00
  • I do not need full performance, I just want a raw speed up. Commented Jun 5, 2014 at 23:03
  • 1
    Build a scipy.spatial.KDTree from the points tx, ty, tz and then use nearest-neighbour look-up in the infinity norm for each point in ox, oy, oz to see whether there is any point close enough. Commented Jun 5, 2014 at 23:07
  • Have you considered using Cython? I have read that it gives large speedups without much pain. stackoverflow.com/questions/7799977/numpy-vs-cython-speed Commented Jun 5, 2014 at 23:14

3 Answers 3

6

You are allocating 10000 lists of size 100000. The first thing to do would be to stop using range for the nested j loop and use the generator version xrange instead. This will save you time and space allocating all those lists.

The next one would be to use vectorized operations:

for i in xrange(0, np.size(tx)):
    index = (ox-oe < tx[i]) & (ox+oe > tx[i]) & (oy-oe < ty[i]) & (oy+oe > ty[i]) & (oz-oe < tz[i]) & (oz+oe > tz[i])
    result[index] += 1  
Sign up to request clarification or add additional context in comments.

2 Comments

Give me a sec, there is the next one coming :)
Ok, the vectorized form is in (may need checking of conditions), but as you modified your question in response to my original answer that part of the answer became irrelevant even though it would still be a good initial step to do in case of non-bumpy data :)
0

You're likely to get good results by running this code under PyPy: http://pypy.org/ (instructions for our NumPy integration at https://bitbucket.org/pypy/numpy)

Comments

0

I think this will give the same result for the double loop and be faster:

for j in xrange(np.size(ox)):
    result[j] += sum( abs(tx-ox[j])<oe[j] & abs(ty-oy[j])<oe[j] & abs(tz-oz[j])<oe[j] )

To get this: 1) reorder the loops (ie, swap them) which is valid since nothing changes within the loops; 2) pull result[j] outside the i loop; 3) convert all the t>ox-oe and t<ox+oe to abs(t-ox)<oe (though this may not be a huge speedup, it's easier to read).

Since you don't have runnable code, and I didn't want to build a test for this, I'm not 100% sure this is correct.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.