Efficiently find indices of all values in an array

Question

I have a very large array, consisting of integers between 0 and N, where each value occurs at least once.

I'd like to know, for each value k, all the indices in my array where the array's value equals k.

For example:

arr = np.array([0,1,2,3,2,1,0])
desired_output = {
    0: np.array([0,6]),
    1: np.array([1,5]),
    2: np.array([2,4]),
    3: np.array([3]),
    }

Right now I am accomplishing this with a loop over range(N+1), and calling np.where N times.

indices = {}
for value in range(max(arr)+1):
    indices[value] = np.where(arr == value)[0]

This loop is by far the slowest part of my code. (Both the arr==value evaluation and the np.where call take up significant chunks of time.) Is there a more efficient way to do this?

I also tried playing around with np.unique(arr, return_index=True) but that only tells me the very first index, rather than all of them.

You can iterate over unique items in the array instead of looping using range. Create a set out of the array and iterate over that set. It can reduce the overhead. — nishparadox
– nishparadox, Commented Aug 18, 2016 at 8:59
@nishparadox2 That's not going to do anything, because I already know what the unique values are: 0 through N, which you get more efficiently with a range than with a call to unique(). — acdr
– acdr, Commented Aug 18, 2016 at 9:05

Divakar · Accepted Answer · 2016-08-18 09:05:50Z

7

Approach #1

Here's a vectorized approach to get those indices as list of arrays -

sidx = arr.argsort()
unq, cut_idx = np.unique(arr[sidx],return_index=True)
indices = np.split(sidx,cut_idx)[1:]

If you want the final dictionary that corresponds each unique element to their indices, finally we can use a loop-comprehension -

dict_out = {unq[i]:iterID for i,iterID in enumerate(indices)}

Approach #2

If you are just interested in the list of arrays, here's an alternative meant for performance -

sidx = arr.argsort()
indices = np.split(sidx,np.flatnonzero(np.diff(arr[sidx])>0)+1)

edited Aug 18, 2016 at 9:05

answered Aug 18, 2016 at 8:57

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

acdr Over a year ago

That is quite clever. I'll give this a try.

acdr Over a year ago

So this is about 6.5 times faster than the answer I actually accepted. Given that the cleverness does obfuscate what the purpose of the code is, I'll stick to the simpler solution, but it definitely deserves an upvote and some props. :)

Divakar Over a year ago

@acdr I appreciate any feedback on runtimes, as I worry about that a lot. Thanks for bringing in those upvotes ;)

Kasravnd · Accepted Answer · 2016-08-18 10:12:05Z

3

A pythonic way is using collections.defaultdict():

>>> from collections import defaultdict
>>> 
>>> d = defaultdict(list)
>>> 
>>> for i, j in enumerate(arr):
...     d[j].append(i)
... 
>>> d
defaultdict(<type 'list'>, {0: [0, 6], 1: [1, 5], 2: [2, 4], 3: [3]})

And here is a Numpythonic way using a dictionary comprehension and numpy.where():

>>> {i: np.where(arr == i)[0] for i in np.unique(arr)}
{0: array([0, 6]), 1: array([1, 5]), 2: array([2, 4]), 3: array([3])}

And here is a pure Numpythonic approach if you don't want to involve the dictionary:

>>> uniq = np.unique(arr)
>>> args, indices = np.where((np.tile(arr, len(uniq)).reshape(len(uniq), len(arr)) == np.vstack(uniq)))
>>> np.split(indices, np.where(np.diff(args))[0] + 1)
[array([0, 6]), array([1, 5]), array([2, 4]), array([3])]

edited Aug 18, 2016 at 10:12

answered Aug 18, 2016 at 8:51

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

4 Comments

acdr Over a year ago

My input array is rather large, so I'd expected this solution to be even slower than mine (considering it loops over all elements) but it turns out to be very fast. Considering it's simpler than the other answers, I'll accept this one.

Divakar Over a year ago

Well this deserves an upvote too given it's simpler as OP stated! :)

Kasravnd Over a year ago

@acdr I just gave this answer based on your expected output which is a dictionary, Check out the answer for only categorizing the indices in numpy.

Kasravnd Over a year ago

@Divakar Forza KISS principal!

Daniel Roseman · Accepted Answer · 2016-08-18 08:51:48Z

1

I don't know numpy but you could definitely do this in one iteration, with a defaultdict:

indices = defaultdict(list)
for i, val in enumerate(arr):
    indices[val].append(i)

answered Aug 18, 2016 at 8:51

Daniel Roseman

602k68 gold badges910 silver badges923 bronze badges

Comments

Eelco Hoogendoorn · Accepted Answer · 2016-08-18 09:43:19Z

0

Fully vectorized solution using the numpy_indexed package:

import numpy_indexed as npi
k, idx = npi.groupy_by(arr, np.arange(len(arr)))

On a higher level; why do you need these indices? Subsequent grouped-operations can usually be computed much more efficiently using the group_by functionality [eg, npi.group_by(arr).mean(someotherarray)], without explicitly computing the indices of the keys.

answered Aug 18, 2016 at 9:43

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

Efficiently find indices of all values in an array

4 Answers 4

3 Comments

4 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

4 Comments

Comments

Comments

Related