Speed up matrix calculations(looping over subarrays) [numpy]

Question

I'm trying to write an algorithm for my student work, it is working well. However, it takes a long time to calculate, especially with big arrays. This part of code is slowing down all program.

Shapes: X.shape = mask.shape = logBN.shape = (500,500,1000), 
        F.shape = (20,20), 
        A.shape = (481,481), 
        s2 -- scalar.

How should I change this code to make it faster?

h = F.shape[0]
w = F.shape[1]
q = np.zeros((A.shape[0], A.shape[1], X.shape[2]))
for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        mask[:,:,:] = 0
        mask[i:i + h,j:j + w,:] = 1
        q[i,j,:] = ((logBN*(1 - mask)).sum(axis=(0,1)) + 
                    (np.log(norm._pdf((X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])/s2)/s2)).sum(axis=(0,1))

This isn't complete - put all your variables (F,A,X) so people can work with something. If iterating over arrays your usually better off converting to python lists, since it's very slow - what's fast is using vector operations. — kabanus
– kabanus, Commented Nov 1, 2016 at 17:00
@kabanus I can't they are generated during the work of program. — Acapello
– Acapello, Commented Nov 1, 2016 at 17:04
I suggest printing them once, and pasting here the result at the beginning. — kabanus
– kabanus, Commented Nov 1, 2016 at 17:05
@Divakar I just have added information: logBN -- matrix and s2 -- scalar. — Acapello
– Acapello, Commented Nov 1, 2016 at 17:05

Divakar · Accepted Answer · 2016-11-01 18:25:42Z

After heavy juggling through algebraic operations of log, exp, power, it all came to this -

# Params
m,n = F.shape[:2]
k1 = 1.0/(s2*np.sqrt(2*np.pi))
k2 = -0.5/s2**2
k3 = np.log(k1)*m*n

out = np.zeros((A.shape[0], A.shape[1], X.shape[2]))
for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        mask[:] = 1
        mask[i:i + h,j:j + w,:] = 0
        XF = (X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])        
        p1 = np.einsum('ijk,ijk->k',logBN,mask)
        p2 = k2*np.einsum('ijk,ijk->k',XF,XF)
        out[i,j,:] = p1 + p2
out += k3

Few things used were -

1] norm._pdf is basically : norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi). So, we could inline the implementation and optimize those at the script level.

2] The division by scalars won't be efficient, so those were replaced by multiplications by their reciprocals. So, as a pre-processing store their reciprocals before going into the loop.

That's grate! Thank you! I have thoughts to change an algorithm, but your solution with einsum works 8 times faster. Though I can't understand why it works faster ordinary numpy functions.
@Acapello Trust me I had to grill through algebraic functions, hence the comment at the top of the post. Was a grueling session! :) This is faster because we are doing much lesser operations.

hpaulj · Accepted Answer · 2016-11-01 17:49:27Z

Just trying to make sense of your inner loop

    mask[:,:,:] = 0
    mask[i:i + h,j:j + w,:] = 1
    q[i,j,:] = ((logBN*(1 - mask)).sum(axis=(0,1)) + 
                (np.log(norm._pdf((X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])/s2)/s2)).sum(axis=(0,1))

looks like

idx = (slice(i,i+h), slice(j,j_w), slice(None))
mask = np.zeros(X.shape)
mask(idx) = 1
mask = 1 - mask 
# alt mask=np.ones(X.shape);mask[idx]=0
term1 = (logBN*mask).sum(axis=(0,1))
term2 = np.log(norm._pdf((X[idx] - F[...,None])/s2)/s2).sum(axis=(0,1))
q[i,j,:] = term1 + term2

So idx and mask define a subarray in A. You are using logBN outside the array; and term inside it. You are summing values on 1st 2 dim, so both term1 and term2 has shape X.shape[2], which you save in q.

That mask/window is 20x20.

As a first cut I'd try to calculate that term2 for all i,j at once. That looks like a typical sliding window problem. I'd also try to express the term1 as a subtraction - the whole logBN minus this window.

Collectives™ on Stack Overflow

Speed up matrix calculations(looping over subarrays) [numpy]

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related