1

I'm trying to write an algorithm for my student work, it is working well. However, it takes a long time to calculate, especially with big arrays. This part of code is slowing down all program.

Shapes: X.shape = mask.shape = logBN.shape = (500,500,1000), 
        F.shape = (20,20), 
        A.shape = (481,481), 
        s2 -- scalar.

How should I change this code to make it faster?

h = F.shape[0]
w = F.shape[1]
q = np.zeros((A.shape[0], A.shape[1], X.shape[2]))
for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        mask[:,:,:] = 0
        mask[i:i + h,j:j + w,:] = 1
        q[i,j,:] = ((logBN*(1 - mask)).sum(axis=(0,1)) + 
                    (np.log(norm._pdf((X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])/s2)/s2)).sum(axis=(0,1))
4
  • This isn't complete - put all your variables (F,A,X) so people can work with something. If iterating over arrays your usually better off converting to python lists, since it's very slow - what's fast is using vector operations. Commented Nov 1, 2016 at 17:00
  • @kabanus I can't they are generated during the work of program. Commented Nov 1, 2016 at 17:04
  • I suggest printing them once, and pasting here the result at the beginning. Commented Nov 1, 2016 at 17:05
  • @Divakar I just have added information: logBN -- matrix and s2 -- scalar. Commented Nov 1, 2016 at 17:05

2 Answers 2

1

After heavy juggling through algebraic operations of log, exp, power, it all came to this -

# Params
m,n = F.shape[:2]
k1 = 1.0/(s2*np.sqrt(2*np.pi))
k2 = -0.5/s2**2
k3 = np.log(k1)*m*n

out = np.zeros((A.shape[0], A.shape[1], X.shape[2]))
for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        mask[:] = 1
        mask[i:i + h,j:j + w,:] = 0
        XF = (X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])        
        p1 = np.einsum('ijk,ijk->k',logBN,mask)
        p2 = k2*np.einsum('ijk,ijk->k',XF,XF)
        out[i,j,:] = p1 + p2
out += k3

Few things used were -

1] norm._pdf is basically : norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi). So, we could inline the implementation and optimize those at the script level.

2] The division by scalars won't be efficient, so those were replaced by multiplications by their reciprocals. So, as a pre-processing store their reciprocals before going into the loop.

Sign up to request clarification or add additional context in comments.

2 Comments

That's grate! Thank you! I have thoughts to change an algorithm, but your solution with einsum works 8 times faster. Though I can't understand why it works faster ordinary numpy functions.
@Acapello Trust me I had to grill through algebraic functions, hence the comment at the top of the post. Was a grueling session! :) This is faster because we are doing much lesser operations.
0

Just trying to make sense of your inner loop

    mask[:,:,:] = 0
    mask[i:i + h,j:j + w,:] = 1
    q[i,j,:] = ((logBN*(1 - mask)).sum(axis=(0,1)) + 
                (np.log(norm._pdf((X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])/s2)/s2)).sum(axis=(0,1))

looks like

idx = (slice(i,i+h), slice(j,j_w), slice(None))
mask = np.zeros(X.shape)
mask(idx) = 1
mask = 1 - mask 
# alt mask=np.ones(X.shape);mask[idx]=0
term1 = (logBN*mask).sum(axis=(0,1))
term2 = np.log(norm._pdf((X[idx] - F[...,None])/s2)/s2).sum(axis=(0,1))
q[i,j,:] = term1 + term2

So idx and mask define a subarray in A. You are using logBN outside the array; and term inside it. You are summing values on 1st 2 dim, so both term1 and term2 has shape X.shape[2], which you save in q.

That mask/window is 20x20.

As a first cut I'd try to calculate that term2 for all i,j at once. That looks like a typical sliding window problem. I'd also try to express the term1 as a subtraction - the whole logBN minus this window.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.