Numpy array to dictionary with indices as values

Question

I have a numpy array with integer values, let's call this array x.

I want to create some sort of list where for each value, I have the indices of x that hold this value.

For example, for:

x = [1,2,2,4,7,1,1,7,16]

I want to get:

{1: [0,5,6], 2:[1,2], 4:[3], 7:[4,7], 16:[8]}

The parenthesis I used are arbitrary, I don't care which data structure I use as long as I can output my result to a file as quickly as possible. At the end I want a .txt file that reads:

0,5,6

1,2

3

4,7

8

What have you tried so far?

navneethc
– navneethc

2021-04-16 14:07:04 +00:00
Commented Apr 16, 2021 at 14:07 — navneethc
– navneethc, Commented Apr 16, 2021 at 14:07

sacuL · Accepted Answer · 2021-04-16 15:47:35Z

Since you mentioned you're not picky about the data structure of your values,tTo get something like the dictionary you posted in your question, you could do a dictionary comprehension over the unique values in x with np.where for the values:

>>> {i:np.where(x == i)[0] for i in set(x)}

{1: array([0, 5, 6]),
 2: array([1, 2]),
 4: array([3]),
 7: array([4, 7]),
 16: array([8])}

Comparing this to a more vanilla loop through a list, this will be significantly faster for larger arrays:

def list_method(x):
    res = {i:[] for i in set(x)}
    for i, value in enumerate(x):
         res[value].append(i)
    return res

def np_method(x):
    return {i:np.where(x == i)[0] for i in set(x)}

x = np.random.randint(1, 50, 1000000)


In [5]: %timeit list_method(x)
259 ms ± 4.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %timeit np_method(x)
120 ms ± 4.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Would you say that this is the fastest way of doing so? I'm being graded on how fast my code runs and I'm unsure whether using np.where for every value (I have the values in advanced by the way, they're 0 through k-1 when I have k) is faster than pure python wit one pass over the array. I would love to get your opinion on this, thanks.
@AmitSharon you can check out the timing comparing my method vs a loop through the list in my edit.

Victor Ermakov · Accepted Answer · 2021-04-16 15:39:09Z

1

Pure python will be like this:

result = {}
for idx,val in enumerate(x):
    arr = result.get(val,[])
    arr.append(idx)
    result[val] = arr

edited Apr 16, 2021 at 15:39

answered Apr 16, 2021 at 14:11

Victor Ermakov

4913 silver badges6 bronze badges

Comments

Duckduckcode · Accepted Answer · 2023-11-09 22:29:19Z

Fastest method is to use defaultdict (Pure Python with O(n))

from collections import defaultdict

def list_to_unique_dict(data):
    unique_dict = defaultdict(list)
    for index, value in enumerate(data):
        unique_dict[value].append(index)
    return dict(unique_dict)

x = [1,2,2,4,7,1,1,7,16]
list_to_unique_dict(x)

Out[62]: {1: [0, 5, 6], 2: [1, 2], 4: [3], 7: [4, 7], 16: [8]}


x = np.random.randint(1, 50, 1000000)
%timeit list_to_unique_dict(x)
232 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sid · Accepted Answer · 2021-04-16 14:18:44Z

0

x = [1,2,2,4,7,1,1,7,16]
numlist = []
numdict = {}
c = 0
for n in x:
    if n not in numlist:
        numlist.append(n)
        numdict[n] = [c]
    else:
        numdict[n].append(c)
    c += 1
print(numlist, numdict)

Output is: [1, 2, 4, 7, 16] {1: [0, 5, 6], 2: [1, 2], 4: [3], 7: [4, 7], 16: [8]} To write to file use:

with open('file.txt', 'w') as f:
    f.write(str(numdict))

answered Apr 16, 2021 at 14:18

Sid

2,1891 gold badge15 silver badges31 bronze badges

Collectives™ on Stack Overflow

Numpy array to dictionary with indices as values

4 Answers 4

2 Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Related