I'm going to simulate diffraction patterns of a normal incident gaussian profile beam from a 2D array of point scatterers with a distribution in heights.
The 2D array of scatterer positions X, Y and Z each have size N x N and these are summed-over in each call to E_ab(a, b, positions, w_beam). This is done M x M times to build up the diffraction pattern.
If I estimate ten floating point operations per scatter site per pixel and a nanosecond per flop (what my laptop does for small numpy arrays) I'd expect the time to be 10 M^2 N^2 1E-09 seconds. For small N this runs a factor of 50 or 100 slower than that, and for large N (larger than say 2000) it slows down even further. I am guessing these have something to do with the paging of the large arrays in memory.
What can I do to increase the speed for large N?
note: Right now the height variation Z is random, in the future I plan to include an additional systematic height variation term as well, so even though purely gaussian variation might have an analytical solution, I need to do this numerically.
Since I'm distributing Z height randomly here, plots will look a little different each time. My output (run on a laptop) is as follows, and I can not even begin to understand why it takes longer (~ 16 seconds) when w_beam is small than when it is large (~6 seconds). (When I run this on an older Python 2 that came with IDLE all four are between 10 and 10.5 seconds)
My estimator 10 M^2 N^2 1E-09 suggests 0.25 seconds, these are roughly 50 times slower so there may be room for substantial improvement.
1 16.460583925247192
2 14.861294031143188
4 8.405776023864746
8 6.4988932609558105
Total time: about 46 seconds on 2012 MacBook and recent Anaconda Python 3 installation.
Python script:
import numpy as np
import matplotlib.pyplot as plt
import time
def E_ab(a, b, positions, w_beam, k0):
X, Y, Z = positions
Rsq = X**2 + Y**2
phases = k0 * (a*X + b*Y + (1 + np.sqrt(1 - a**2 - b**2))*Z)
E = np.exp(-Rsq/w_beam**2) * np.exp(-j*phases)
return E.sum() / w_beam**2 # rough normalization
twopi, j = 2*np.pi, np.complex(0, 1)
wavelength = 0.08
k0 = twopi/wavelength
z_noise = 0.05 * wavelength
N, M = 100, 50
x = np.arange(-N, N+1)
X, Y = np.meshgrid(x, x)
Z = z_noise * np.random.normal(size=X.shape) # use random Z noise for now
positions = [X, Y, Z]
A = np.linspace(0, 0.2, M)
answers = []
for w_beam in (1, 2, 4, 8):
E = []
tstart = time.time()
for i, a in enumerate(A):
EE = []
for b in A:
e = E_ab(a, b, positions, w_beam, k0)
EE.append(e)
E.append(EE)
print(w_beam, time.time() - tstart)
answers.append(np.array(E))
if True:
plt.figure()
for i, E in enumerate(answers):
plt.subplot(2, 2, i+1)
plt.imshow(np.log10(np.abs(E)), vmin=0.0001)
plt.colorbar()
plt.show()

k0supposed to be a global? It is used inE_abbut not passes as an argument. \$\endgroup\$k0an argument. \$\endgroup\$w_beam, because the exponent innp.exp(-Rsq/w_beam**2)is larger and takes longer to compute. \$\endgroup\$np.exp()has changed, doesn't it call some transcendental functions built into my laptop's CPU? \$\endgroup\$