1

I have a numpy array with integer values, let's call this array x.

I want to create some sort of list where for each value, I have the indices of x that hold this value.

For example, for:

x = [1,2,2,4,7,1,1,7,16]

I want to get:

{1: [0,5,6], 2:[1,2], 4:[3], 7:[4,7], 16:[8]}

The parenthesis I used are arbitrary, I don't care which data structure I use as long as I can output my result to a file as quickly as possible. At the end I want a .txt file that reads:

0,5,6

1,2

3

4,7

8

1
  • 1
    What have you tried so far? Commented Apr 16, 2021 at 14:07

4 Answers 4

3

Since you mentioned you're not picky about the data structure of your values,tTo get something like the dictionary you posted in your question, you could do a dictionary comprehension over the unique values in x with np.where for the values:

>>> {i:np.where(x == i)[0] for i in set(x)}

{1: array([0, 5, 6]),
 2: array([1, 2]),
 4: array([3]),
 7: array([4, 7]),
 16: array([8])}

Comparing this to a more vanilla loop through a list, this will be significantly faster for larger arrays:

def list_method(x):
    res = {i:[] for i in set(x)}
    for i, value in enumerate(x):
         res[value].append(i)
    return res

def np_method(x):
    return {i:np.where(x == i)[0] for i in set(x)}

x = np.random.randint(1, 50, 1000000)


In [5]: %timeit list_method(x)
259 ms ± 4.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %timeit np_method(x)
120 ms ± 4.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

2 Comments

Would you say that this is the fastest way of doing so? I'm being graded on how fast my code runs and I'm unsure whether using np.where for every value (I have the values in advanced by the way, they're 0 through k-1 when I have k) is faster than pure python wit one pass over the array. I would love to get your opinion on this, thanks.
@AmitSharon you can check out the timing comparing my method vs a loop through the list in my edit.
1

Pure python will be like this:

result = {}
for idx,val in enumerate(x):
    arr = result.get(val,[])
    arr.append(idx)
    result[val] = arr

Comments

1

Fastest method is to use defaultdict (Pure Python with O(n))

from collections import defaultdict

def list_to_unique_dict(data):
    unique_dict = defaultdict(list)
    for index, value in enumerate(data):
        unique_dict[value].append(index)
    return dict(unique_dict)

x = [1,2,2,4,7,1,1,7,16]
list_to_unique_dict(x)

Out[62]: {1: [0, 5, 6], 2: [1, 2], 4: [3], 7: [4, 7], 16: [8]}


x = np.random.randint(1, 50, 1000000)
%timeit list_to_unique_dict(x)
232 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Comments

0
x = [1,2,2,4,7,1,1,7,16]
numlist = []
numdict = {}
c = 0
for n in x:
    if n not in numlist:
        numlist.append(n)
        numdict[n] = [c]
    else:
        numdict[n].append(c)
    c += 1
print(numlist, numdict)

Output is: [1, 2, 4, 7, 16] {1: [0, 5, 6], 2: [1, 2], 4: [3], 7: [4, 7], 16: [8]} To write to file use:

with open('file.txt', 'w') as f:
    f.write(str(numdict))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.