1

I'm looking for an efficient way to return indices for a 2d array based on values in a 1d array. I currently have a nested for loop set up that is painfully slow.

Here is some example data and what I want to get:

data2d = np.array( [  [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])

data1d = np.array([1,2,3,4,5,6,7,8,9])

I would like to return the indices where data2d is equal to data1d. My desired output would be this 2d array:

locs = np.array([[0, 1], [0, 2], [2, 3], [0, 1], [6, 8]])

The only thing I've come up with is the nested for loop:

locs = np.full((np.shape(data2d)), np.nan)

for i in range(0, 5):
    for j in range(0, 2):
        loc_val = np.where(data1d == data2d[i, j])
        loc_val = loc_val[0]
        locs[i, j] = loc_val

This would be fine for a small set of data but I have 87,600 2d grids that are each 428x614 grid points.

3
  • Is data1d sorted? Commented Jan 25, 2019 at 20:16
  • Also, are all points in data2 guaranteed to exist in data1? Commented Jan 25, 2019 at 20:17
  • Yes it is sorted for the data I'm working with. And yes all points are guaranteed to exist. Commented Jan 25, 2019 at 20:19

2 Answers 2

1

Use np.searchsorted:

np.searchsorted(data1d, data2d.ravel()).reshape(data2d.shape)

array([[0, 1],
       [0, 2],
       [2, 3],
       [0, 1],
       [6, 8]])

searchsorted performs binary search with the ravelled data2d. The result is then reshaped.


Another option is to build an index and query it in constant time. You can do this with pandas' Index API.

import pandas as pd

idx = pd.Index([1,2,3,4,5,6,7,8,9])
idx
#  Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

idx.get_indexer(data2d.ravel()).reshape(data2d.shape)

array([[0, 1],
       [0, 2],
       [2, 3],
       [0, 1],
       [6, 8]])
Sign up to request clarification or add additional context in comments.

1 Comment

Wow, those both seem like great solutions, thanks! Now I just need to see how it performs on the larger data.
0

This should be fast also

import numpy as np
data2d = np.array( [  [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])
data1d = np.array([1,2,3,4,5,6,7,8,9])
idxdict = dict(zip(data1d,range(len(data1d))))
locs = data2d
for i in range(len(locs)):
    for j in range(len(locs[i])):
        locs[i][j] = idxdict[locs[i][j]]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.