I'm writing a neural net from scratch and need to implement the following operation: For each row of matrix dY, take the outer product of the same row of another matrix S (same shape as Y) with itself, and multiply that row of dY by that matrix outer(S[i,:], S[i,:]). Also multiply dY * S element-wise and add that to it.
The code below does this, but it's not vectorized. Can you help me speed this up?
out = dY.copy()
for i in range(dY.shape[0]):
out[i, :] = dY[i, :] * S[i, :] - dY[i, :].dot(np.outer(S[i, :], S[i, :]))
Update: The following takes an (n,m) matrix S and returns a matrix of shape (n,m,m) where for each row, we take the outer product with itself.
np.einsum("ab,ad->abd", S, S)
Update 2: Finally solved it using two applications of np.einsum.
S_outer = np.einsum("ab,ad->abd", S, S)
return dY * S - np.einsum("ab,abc->ac", dY, S_outer)
out = np.empty_like(dY). You don't need to prefill, much less copy the data.