For Loop Optimization in Python

Question

I have run the following code for matrices of size 100 million rows and 100 columns.

I am looking to optimize the following code. Any help would be really appreciated.

def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):

    Q = Q.T

    for step in range(steps):
        e = 0
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    temp_dot = numpy.dot(P[i,:],Q[:,j])
                    eij = R[i][j] - temp_dot
                    e = e + eij*eij
                    for k in range(K):
                        P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                        Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])
                        e = e + (beta/2) * ((P[i][k]*P[i][k]) + (Q[k][j]*Q[k][j]))
        if e < 0.001:
            break

    return P, Q.T

Does it have to run the entire thing all at once or can you chunk this or thread it? I am not familiar enough with your operations to know if you can't split this so that parts of it are analyzed in parallel. — DejaVuSansMono
– DejaVuSansMono, Commented Oct 20, 2016 at 16:27
i am using corei5 with 8gb ram...i tested a sample matrix of 5000 rows & 5 columns..it took close to 1 hour with value of K=5 — Dinesh Singh
– Dinesh Singh, Commented Oct 20, 2016 at 16:29
What sort of matrix factorization is this function intended to compute? — user2357112
– user2357112, Commented Oct 20, 2016 at 16:29

Eric · Accepted Answer · 2016-10-20 22:35:08Z

3

Your inner loop:

for k in range(K):
    P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
    Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])
    e = e + (beta/2) * ((P[i][k]*P[i][k]) + (Q[k][j]*Q[k][j]))

Can be vectorized as:

P[i,:] += alpha    * np.sum(2 * eij * Q[:,j] - beta * P[i,:], axis=-1)
Q[:,j] += alpha    * np.sum(2 * eij * P[i,:] - beta * Q[:,j], axis=-1)
e      += (beta/2) * np.sum(P[i,:]*P[i,:] + Q[:,j]*Q[:,j])

edited Oct 20, 2016 at 22:35

answered Oct 20, 2016 at 16:49

Eric

98.1k54 gold badges257 silver badges389 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Divakar Over a year ago

Nice one! One more improvement could be at the last step to use (a+b)**2 property there. So, we can use einsum and dot there : a_b = a + b; e = np.einsum('i,i',a_b,a_b) - 2*a.dot(b), where a and b are P[i,:] and Q[:,j]. Love those funcs! :)

Eric Over a year ago

@Divakar: Seems counter-intuitive to me that that would be faster

Collectives™ on Stack Overflow

For Loop Optimization in Python

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related