Searching 1d array in 2d array in python

Question

Say I have a massive 2D Database shaped (1.2mil, 6).

I want to find the index of a 1D array (1, 6) in the big_DB. I actually have 64 of these vectors to search for at a time, shaped (64, 6).

Here's my code:

for data in range(64): # I have 64 1d arrays to find
    self.idx = np.where((big_DB == arrays[data]).all(axis=1))

This takes 0.043 sec (for all 64 arrays). Is there a faster method to do this? My project will call the search function over 40,000 times.

Edit) The big_DB is the result of itertools.product, unique in row, float.

Here's a related post I made: stackoverflow.com/q/64215263/2988730. Searching is easier than sorting. Will post once you've clarified a few things. — Mad Physicist
– Mad Physicist, Commented Mar 25, 2021 at 21:24
Is big_DB sorted? Can it be? Does it get updated frequently? Are you open to using a mapping type instead? Hashing will be much faster than linear or even binary search (the 400k iterations matters) — Mad Physicist
– Mad Physicist, Commented Mar 25, 2021 at 21:26

Mad Physicist · Accepted Answer · 2021-03-25 22:03:38Z

0

The fastest I've been able to get this to work is using O(1) lookup using Python's builtin dict type. You need to pre-process your DB, which may take a second or two at most, but lookups go from >100ms on my machine to <50us: an improvement by 2000x or better for all 64 lookups. You may get slightly worse results because I tested with a 100k-element database. The larger DB you have may cause more hash collisions.

To make the lookup hash-table, I turned each row of big_DB into a bytes object. This makes up the key. Values are then indices of each element, since that's how you want to do the lookup:

dt = f'V{big_DB.shape[1] * big_DB.dtype.itemsize}'
dict_db = dict(zip(map(np.void.item, np.squeeze(big_DB.view(dt))), range(len(big_DB))))

The resulting lookup is as simple as

idx = dict_db[x.view(dt).item()]

edited Mar 25, 2021 at 22:03

answered Mar 25, 2021 at 21:51

Mad Physicist

116k29 gold badges201 silver badges291 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

onlyhappiness Over a year ago

what is x in 'idx = dict_db[x.view(dt).item()]' ?

onlyhappiness Over a year ago

This raise "descriptor 'item' for 'numpy.generic' objects doesn't apply to a 'numpy.ndarray' object"

Mad Physicist Over a year ago

In that case, you need to provide a complete MCVE. I made assumptions about how the arrays look, and apparently I was wrong.

Collectives™ on Stack Overflow

Searching 1d array in 2d array in python

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related