Say I have a massive 2D Database shaped (1.2mil, 6).
I want to find the index of a 1D array (1, 6) in the big_DB. I actually have 64 of these vectors to search for at a time, shaped (64, 6).
Here's my code:
for data in range(64): # I have 64 1d arrays to find
self.idx = np.where((big_DB == arrays[data]).all(axis=1))
This takes 0.043 sec (for all 64 arrays). Is there a faster method to do this? My project will call the search function over 40,000 times.
Edit) The big_DB is the result of itertools.product, unique in row, float.
big_DBsorted? Can it be? Does it get updated frequently? Are you open to using a mapping type instead? Hashing will be much faster than linear or even binary search (the 400k iterations matters)