optimize this numpy operation

Question

I have inherited some code and there is one particular operation that takes an inordinate amount of time.

The operation is defined as:

cutoff = 0.2
# X has shape (76187, 247, 20)
X_flat = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))
weightfun = lambda x: 1.0 / np.sum(np.dot(X_flat, x) / np.dot(x, x) > 1 - cutoff)
# This is expensive...
N_list = np.array(list(map(weightfun, X_flat)))

This takes hours to compute on my machine. I am wondering if there is a way to optimize this. The code is computing normalized hamming distances between vector sequences.

The slowness is because if runs the weightfun function 76187 times - in the python level map loop. Speeding it up requires studying the lambda and changing it to work on multiple rows at once. Give us a small array test case. — hpaulj
– hpaulj, Commented Nov 25, 2021 at 15:28

Walisson Cardoso · Accepted Answer · 2021-11-25 20:41:43Z

weightfun performs two dot product operations for every row of X_flat. The worst one is np.dot(X_flat, x), where the dot product is performed against the whole X_flat matrix. But there's a trick to speed things up. The iterative part in the first dot product can be computed only once with:

X_matmut = X_flat @ X_flat.T

Also, I noticed that the second dot product is nothing more than the diagonal of the result of the first one.

The rewritten code looks like this:

cutoff = 0.2
# X has shape (76187, 247, 20)
X_flat = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))

X1 = X_flat @ X_flat.T
X2 = X1.diagonal()

N_list = 1.0 / (X1/X2 > 1 - cutoff).sum(axis=0)

Edit

For such a large input, when performing the operation above the memory becomes the new bottleneck as the new matrix won't fit into RAM. So there's also the option of breaking the computation into chunks, as the code below shows. The code gets a little messy, but at least it didn't try to destroy my PC :-P

import numpy as np
import time

# Sample data
X = np.random.random([76187, 247, 20])

start = time.time()

cutoff = 0.2
X_flat = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))

# Divide data into 20 chuncks
X_parts = np.array_split(X_flat, 20)

# Diagonal will be saved incrementally
diagonal = []
for i in range(len(X_parts)):
    part = X_parts[i]
    X_parts[i] = part @ X_flat.T
    diagonal.extend(X_parts[i][range(len(X_parts[i])), range(len(diagonal), len(diagonal)+len(X_parts[i]))])

# Performs the second part of the calculation
diagonal = np.array(diagonal)
X_list = np.zeros(len(diagonal))
for x in X_parts:
    X_list += (x/diagonal > 1 - cutoff).sum(axis=0)
X_list = 1.0 / X_list

print('Time to solve: %.2f secs' % (time.time() - start))

I would love to be able to perform all the computation on a single loop and discard the used chunks, but it is obligatory to run over the whole matrix once to retrieve the diagonal. Don't believe it's worth to compute everything twice to save memory.

While I use a decent setup (16 GB of RAM in a i7 intel and SSD drive for storage), the whole processing took me around 15 minutes.

I just noticed that the solution above would require a huge amount of memory for perfoming the matrix multiplications with your input size. When I tested with random data, the computer started paginating, what reeally makes the performance goes down. The solution would work well for smaller samples.
Thanks for the interesting attempt. Yes, it is clear that this is difficult to do anything with without using GPUs or like.
Thank you for accepting my previous answer, but I kind of got into the problem, so I made an edit haha It's hard to verify if the code I posted is completly right, but I got matches for small samples with yours, so I suppose it is.
Ahhhh, so thanks for the extra effort. I will test this today and let you know!

Collectives™ on Stack Overflow

optimize this numpy operation

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related