I am calculating distances between multiple points. The array gals_pos is very large (almost 100,000 points) and sph_pos has 20 points.
The issue is that it is a slow code. I want to make it fast since I will apply it to more than a billion points (array gals_pos).
I call the following part of the code to give me distances. First I call function named distance_calc and get the distance on x axis, then on y axis and on z axis. Then I use the dx, dy and dz to calculate the magnitude of the distance. Please suggest ways in which I can make it faster.
import numpy as np
import time
gals_pos = np.random.uniform(low = 0.0, high = 1000.0, size = (10000,3))
sph_pos = np.random.uniform(low = 0.0, high = 1000.0, size = (100,3))
max_axis_lim = 1000.0
min_axis_lim = 0.0
shift_position_constant = max_axis_lim/2
time_init = time.clock()
def distance_calc(gals_pos,sph_pos, axis):
dxyzd = gals_pos[None, :, axis] - sph_pos[:, None, axis]
#dxyzd_cdist = spatial.cdist(sph_pos, gals_pos, 'euclidean') #unusable here since we want to do axis subtraction for dx, dy and dz
dxyzd[dxyzd>max_axis_lim] -= shift_position_constant
dxyzd[dxyzd<min_axis_lim] += shift_position_constant
return dxyzd
def dist_mag(dx,dy,dz):
dist_m = np.sqrt(dx**2+dy**2+dz**2)
return dist_m
dxx = distance_calc(gals_pos,sph_pos,0)
dyy = distance_calc(gals_pos,sph_pos,1)
dzz = distance_calc(gals_pos,sph_pos,2)
dist_d = dist_mag(dxx,dyy,dzz)
time_final = time.clock()
time = time_final-time_init
print "time taken = ", time
time taken = 0.11
numpyvector math, without iterations and such, there isn't an obvious way to speed this up. You might be able to combine the 3 axis calcs into one, but I don't expect much of speed improvement. Have you tried profiling?gals_pos[None, :, :] - sph_pos[:, None, :]does not help.