Faster mathematical calculations using numpy and scipy

Question

I am calculating distances between multiple points. The array gals_pos is very large (almost 100,000 points) and sph_pos has 20 points.

The issue is that it is a slow code. I want to make it fast since I will apply it to more than a billion points (array gals_pos).

I call the following part of the code to give me distances. First I call function named distance_calc and get the distance on x axis, then on y axis and on z axis. Then I use the dx, dy and dz to calculate the magnitude of the distance. Please suggest ways in which I can make it faster.

import numpy as np
import time

gals_pos = np.random.uniform(low = 0.0, high = 1000.0, size = (10000,3))
sph_pos = np.random.uniform(low = 0.0, high = 1000.0, size = (100,3))

max_axis_lim = 1000.0
min_axis_lim = 0.0
shift_position_constant = max_axis_lim/2

time_init = time.clock()

def distance_calc(gals_pos,sph_pos, axis):
    dxyzd = gals_pos[None, :, axis] - sph_pos[:, None, axis]
    #dxyzd_cdist = spatial.cdist(sph_pos, gals_pos, 'euclidean') #unusable here since we want to do axis subtraction for dx, dy and dz
    dxyzd[dxyzd>max_axis_lim] -= shift_position_constant
    dxyzd[dxyzd<min_axis_lim] += shift_position_constant
    return dxyzd

def dist_mag(dx,dy,dz):
    dist_m = np.sqrt(dx**2+dy**2+dz**2)
    return dist_m

dxx = distance_calc(gals_pos,sph_pos,0)
dyy = distance_calc(gals_pos,sph_pos,1)
dzz = distance_calc(gals_pos,sph_pos,2)

dist_d = dist_mag(dxx,dyy,dzz)

time_final = time.clock()
time = time_final-time_init
print "time taken = ", time

time taken = 0.11

In my opinion this kind of question is not well suited for SO. If you ask such a question, please add at least a code example by creating random data and show us the timings. — cel
– cel, Commented Dec 26, 2014 at 13:07
Since you are just using straightforward numpy vector math, without iterations and such, there isn't an obvious way to speed this up. You might be able to combine the 3 axis calcs into one, but I don't expect much of speed improvement. Have you tried profiling? — hpaulj
– hpaulj, Commented Dec 26, 2014 at 17:41
Acting on all axis at once, gals_pos[None, :, :] - sph_pos[:, None, :] does not help. — hpaulj
– hpaulj, Commented Dec 26, 2014 at 17:58
@hpaulj Yes, I have done profiling. The longest time taken in these steps is at the axis distance calculation part. This takes 0.005 seconds. If I can speed this up, it will be a great deal of help. — Abhinav Kumar
– Abhinav Kumar, Commented Dec 27, 2014 at 12:55

Eelco Hoogendoorn · Accepted Answer · 2014-12-27 10:46:03Z

1

As noted in the comments, there isn't much you can do to speed this up; although you may be able to gain up to a factor ten using numexpr using a few threads.

However, the bigger question is: do you really need all the pairwise distances? Unless you are writing your own gravity simulator, and are happy to go about it the brute-force manner, the answer is probably no. For calculating short-range interactions, such as for collision detection for instance, look at the functionality in scipy.spatial. It will be many orders of magnitude faster for typical problems.

answered Dec 27, 2014 at 10:46

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Abhinav Kumar Over a year ago

Thanks for your reply. I definitely need each pairwise distance.

Eelco Hoogendoorn Over a year ago

Unless you have invented a new class of problems, you definitely don't.

FuzzyDuck · Accepted Answer · 2014-12-26 14:52:23Z

0

Look at scikit-learn. For example,

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html

answered Dec 26, 2014 at 14:52

FuzzyDuck

1,53112 silver badges14 bronze badges

Collectives™ on Stack Overflow

Faster mathematical calculations using numpy and scipy

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related