1

I am calculating distances between multiple points. The array gals_pos is very large (almost 100,000 points) and sph_pos has 20 points.

The issue is that it is a slow code. I want to make it fast since I will apply it to more than a billion points (array gals_pos).

I call the following part of the code to give me distances. First I call function named distance_calc and get the distance on x axis, then on y axis and on z axis. Then I use the dx, dy and dz to calculate the magnitude of the distance. Please suggest ways in which I can make it faster.

import numpy as np
import time

gals_pos = np.random.uniform(low = 0.0, high = 1000.0, size = (10000,3))
sph_pos = np.random.uniform(low = 0.0, high = 1000.0, size = (100,3))

max_axis_lim = 1000.0
min_axis_lim = 0.0
shift_position_constant = max_axis_lim/2

time_init = time.clock()

def distance_calc(gals_pos,sph_pos, axis):
    dxyzd = gals_pos[None, :, axis] - sph_pos[:, None, axis]
    #dxyzd_cdist = spatial.cdist(sph_pos, gals_pos, 'euclidean') #unusable here since we want to do axis subtraction for dx, dy and dz
    dxyzd[dxyzd>max_axis_lim] -= shift_position_constant
    dxyzd[dxyzd<min_axis_lim] += shift_position_constant
    return dxyzd

def dist_mag(dx,dy,dz):
    dist_m = np.sqrt(dx**2+dy**2+dz**2)
    return dist_m

dxx = distance_calc(gals_pos,sph_pos,0)
dyy = distance_calc(gals_pos,sph_pos,1)
dzz = distance_calc(gals_pos,sph_pos,2)

dist_d = dist_mag(dxx,dyy,dzz)

time_final = time.clock()
time = time_final-time_init
print "time taken = ", time

time taken = 0.11
5
  • 1
    In my opinion this kind of question is not well suited for SO. If you ask such a question, please add at least a code example by creating random data and show us the timings. Commented Dec 26, 2014 at 13:07
  • 1
    I have edited as you suggested. Thanks Commented Dec 26, 2014 at 13:16
  • Since you are just using straightforward numpy vector math, without iterations and such, there isn't an obvious way to speed this up. You might be able to combine the 3 axis calcs into one, but I don't expect much of speed improvement. Have you tried profiling? Commented Dec 26, 2014 at 17:41
  • Acting on all axis at once, gals_pos[None, :, :] - sph_pos[:, None, :] does not help. Commented Dec 26, 2014 at 17:58
  • @hpaulj Yes, I have done profiling. The longest time taken in these steps is at the axis distance calculation part. This takes 0.005 seconds. If I can speed this up, it will be a great deal of help. Commented Dec 27, 2014 at 12:55

2 Answers 2

1

As noted in the comments, there isn't much you can do to speed this up; although you may be able to gain up to a factor ten using numexpr using a few threads.

However, the bigger question is: do you really need all the pairwise distances? Unless you are writing your own gravity simulator, and are happy to go about it the brute-force manner, the answer is probably no. For calculating short-range interactions, such as for collision detection for instance, look at the functionality in scipy.spatial. It will be many orders of magnitude faster for typical problems.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your reply. I definitely need each pairwise distance.
Unless you have invented a new class of problems, you definitely don't.
0

Look at scikit-learn. For example,

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.