Translate every element in numpy array according to key

Question

I am trying to translate every element of a numpy.array according to a given key:

For example:

a = np.array([[1,2,3],
              [3,2,4]])

my_dict = {1:23, 2:34, 3:36, 4:45}

I want to get:

array([[ 23.,  34.,  36.],
       [ 36.,  34.,  45.]])

I can see how to do it with a loop:

def loop_translate(a, my_dict):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(my_dict.get, row)
    return new_a

Is there a more efficient and/or pure numpy way?

Edit:

I timed it, and np.vectorize method proposed by DSM is considerably faster for larger arrays:

In [13]: def loop_translate(a, my_dict):
   ....:     new_a = np.empty(a.shape)
   ....:     for i,row in enumerate(a):
   ....:         new_a[i,:] = map(my_dict.get, row)
   ....:     return new_a
   ....: 

In [14]: def vec_translate(a, my_dict):    
   ....:     return np.vectorize(my_dict.__getitem__)(a)
   ....: 

In [15]: a = np.random.randint(1,5, (4,5))

In [16]: a
Out[16]: 
array([[2, 4, 3, 1, 1],
       [2, 4, 3, 2, 4],
       [4, 2, 1, 3, 1],
       [2, 4, 3, 4, 1]])

In [17]: %timeit loop_translate(a, my_dict)
10000 loops, best of 3: 77.9 us per loop

In [18]: %timeit vec_translate(a, my_dict)
10000 loops, best of 3: 70.5 us per loop

In [19]: a = np.random.randint(1, 5, (500,500))

In [20]: %timeit loop_translate(a, my_dict)
1 loops, best of 3: 298 ms per loop

In [21]: %timeit vec_translate(a, my_dict)
10 loops, best of 3: 37.6 ms per loop

In [22]:  %timeit loop_translate(a, my_dict)

Does this answer your question? Fast replacement of values in a numpy array — AMC
– AMC, Commented Feb 6, 2020 at 0:35
Related question, with the best solution I found on SO: stackoverflow.com/questions/55949809/… — toliveira
– toliveira, Commented Aug 28, 2020 at 19:21

DSM · Accepted Answer · 2013-06-07 20:53:57Z

166

I don't know about efficient, but you could use np.vectorize on the .get method of dictionaries:

>>> a = np.array([[1,2,3],
              [3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
       [36, 34, 45]])

answered Jun 7, 2013 at 20:53

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

jamylak Over a year ago

+1 if OP knows every key will be contained in my_dict as in a, then my_dict.__getitem__ would be a better choice

DSM Over a year ago

@Akavall: that's strange. I don't have 1.6.2 around at the moment to check, though.

Akavall Over a year ago

When I am using my_dict.get I am getting a ValueError, but I don't have that problem when I am using my_dict.__getitem__. I am using numpy 1.6.2

jamylak Over a year ago

@Akavall Using this sample data? If not do you notice any difference with your input data?

Akavall Over a year ago

@jamylak, I am using the same data.

|

Piotr Dabkowski · Accepted Answer · 2020-05-31 17:46:43Z

40

Here's another approach, using numpy.unique:

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> u,inv = np.unique(a,return_inverse = True)
>>> np.array([d[x] for x in u])[inv].reshape(a.shape)
array([[11, 22, 33],
       [33, 22, 11]])

This approach is much faster than np.vectorize approach when the number of unique elements in array is small. Explanaion: Python is slow, in this approach the in-python loop is used to convert unique elements, afterwards we rely on extremely optimized numpy indexing operation (done in C) to do the mapping. Hence, if the number of unique elements is comparable to the overall size of the array then there will be no speedup. On the other hand, if there is just a few unique elements, then you can observe a speedup of up to x100.

edited May 31, 2020 at 17:46

Piotr Dabkowski

5,9795 gold badges42 silver badges49 bronze badges

answered Jun 7, 2013 at 21:38

John Vinyard

13.6k3 gold badges34 silver badges43 bronze badges

5 Comments

william_grisaitis Over a year ago

How does this compare speed-wise to using vectorize(dict.get)?

william_grisaitis Over a year ago

addendum - i found this one to be the fastest (compared to vectoring dictionary.get and iterating through keys)! ymmv...

william_grisaitis Over a year ago

I would make one minor modification, which is to replace d[x] with d.get(x, default_value), where default_value can be whatever you want. For my use case, I was only replacing some values, and others I wanted to leave alone, so I did d.get(x, x).

mjkvaak Over a year ago

This was really a genius solution. I used it to color a grayscale image (here a) with a dict mapping the 1d pixel values into rgb colors with a look-up dict (here d). I tried numpy.vectorize and pandas.DataFrame.apply (that was btw faster than vectorize), but this was the fastest. Thanks!

Stefano Over a year ago

hmm, strange! I tried this with a 12000x12000x1 array and a dictionary with 7000 items. While the vectorize method (with either get or __getitem__, indifferently) ran in a bit under 10s, this other solution took 16s

Community · Accepted Answer · 2017-05-23 11:47:32Z

I think it'd be better to iterate over the dictionary, and set values in all the rows and columns "at once":

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> for k,v in d.iteritems():
...     a[a == k] = v
... 
>>> a
array([[11, 22, 33],
       [33, 22, 11]])

Edit:

While it may not be as sexy as DSM's (really good) answer using numpy.vectorize, my tests of all the proposed methods show that this approach (using @jamylak's suggestion) is actually a bit faster:

from __future__ import division
import numpy as np
a = np.random.randint(1, 5, (500,500))
d = {1 : 11, 2 : 22, 3 : 33, 4 : 44}

def unique_translate(a,d):
    u,inv = np.unique(a,return_inverse = True)
    return np.array([d[x] for x in u])[inv].reshape(a.shape)

def vec_translate(a, d):    
    return np.vectorize(d.__getitem__)(a)

def loop_translate(a,d):
    n = np.ndarray(a.shape)
    for k in d:
        n[a == k] = d[k]
    return n

def orig_translate(a, d):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(d.get, row)
    return new_a


if __name__ == '__main__':
    import timeit
    n_exec = 100
    print 'orig'
    print timeit.timeit("orig_translate(a,d)", 
                        setup="from __main__ import np,a,d,orig_translate",
                        number = n_exec) / n_exec
    print 'unique'
    print timeit.timeit("unique_translate(a,d)", 
                        setup="from __main__ import np,a,d,unique_translate",
                        number = n_exec) / n_exec
    print 'vec'
    print timeit.timeit("vec_translate(a,d)",
                        setup="from __main__ import np,a,d,vec_translate",
                        number = n_exec) / n_exec
    print 'loop'
    print timeit.timeit("loop_translate(a,d)",
                        setup="from __main__ import np,a,d,loop_translate",
                        number = n_exec) / n_exec

Outputs:

orig
0.222067718506
unique
0.0472617006302
vec
0.0357889199257
loop
0.0285375618935

Considering speed may be an issue, iterating like for k in d would make this as fast as possible`
I find that vectorizing is faster for my situation, where a has shape (50, 50, 50), d has 5000 keys, and data are numpy.uint32. And it's not super close... ~0.1 seconds vs ~1.4 seconds. flattening the array doesn't help. :/
How fast this method is, depends on how many unique keys exist in the mapping. In your case the number of keys is much smaller than the dimensions of the 2D array, that's why the performance is close to the vectorized solution. Vectorized becomes much faster if the number of keys becomes comparable to the dimensions of the array.
A big, and sometimes overlooked (just as I did) caveat: if any of your dict values and keys matches (e.g. {1:2, 2:3} ), elements with the value 1 are replaced with 2, then they become 3 - so 1 and 2 both translate to 3. Cautiously reordering the itarator before feeding it to for might help, but to no avail if the dict forms a circular graph.

Eelco Hoogendoorn · Accepted Answer · 2016-07-26 18:27:49Z

The numpy_indexed package (disclaimer: I am its author) provides an elegant and efficient vectorized solution to this type of problem:

import numpy_indexed as npi
remapped_a = npi.remap(a, list(my_dict.keys()), list(my_dict.values()))

The method implemented is similar to the approach mentioned by John Vinyard, but even more general. For instance, the items of the array do not need to be ints, but can be any type, even nd-subarrays themselves.

If you set the optional 'missing' kwarg to 'raise' (default is 'ignore'), performance will be slightly better, and you will get a KeyError if not all elements of 'a' are present in the keys.

This gives me TypeError: invalid type promotion. Perhaps one has to first reshape a to be one dimensional?

Maxim · Accepted Answer · 2018-01-15 14:18:52Z

Assuming your dict keys are positive integers, without huge gaps (similar to a range from 0 to N), you would be better off converting your translation dict to an array such that my_array[i] = my_dict[i], and using numpy indexing to do the translation.

A code using this approach is:

def direct_translate(a, d):
    src, values = d.keys(), d.values()
    d_array = np.arange(a.max() + 1)
    d_array[src] = values
    return d_array[a]

Testing with random arrays:

N = 10000
shape = (5000, 5000)
a = np.random.randint(N, size=shape)
my_dict = dict(zip(np.arange(N), np.random.randint(N, size=N)))

For these sizes I get around 140 ms for this approach. The np.get vectorization takes around 5.8 s and the unique_translate around 8 s.

Possible generalizations:

If you have negative values to translate, you could shift the values in a and in the keys of the dictionary by a constant to map them back to positive integers:

def direct_translate(a, d): # handles negative source keys
    min_a = a.min()
    src, values = np.array(d.keys()) - min_a, d.values()
    d_array = np.arange(a.max() - min_a + 1)
    d_array[src] = values
    return d_array[a - min_a]

If the source keys have huge gaps, the initial array creation would waste memory. I would resort to cython to speed up that function.

Mikhail V · Accepted Answer · 2015-03-15 00:51:07Z

2

If you don't really have to use dictionary as substitution table, simple solution would be (for your example):

a = numpy.array([your array])
my_dict = numpy.array([0, 23, 34, 36, 45])     # your dictionary as array

def Sub (myarr, table) :
    return table[myarr] 

values = Sub(a, my_dict)

This will work of course only if indexes of d cover all possible values of your a, in other words, only for a with usigned integers.

edited Mar 15, 2015 at 0:51

answered Mar 15, 2015 at 0:34

Mikhail V

1,5411 gold badge14 silver badges24 bronze badges

2 Comments

Milo Wielondek Over a year ago

Of course! Much simpler and easily overlooked clever solution.

Robin De Schepper Over a year ago

Isn't this just: a = np.array(); b = np.array(); c = a[b]. You're assuming that the values of b are the indexes of a, which means you don't need a dictionary at all. A trivial case of this problem.

Sergey Mikhaylin · Accepted Answer · 2023-08-11 13:39:39Z

2


def dictonarize(np_array, dictonary, el_type='float'):
    
    final_array = np.zeros_like(np_array).astype(el_type)
    for x in dictonary:
        x_layer = (np_array == x)
        x_layer = (x_layer* dictonary[x]).astype(el_type)
        final_array += x_layer
        
    return final_array

answered Aug 11, 2023 at 13:39

Sergey Mikhaylin

212 bronze badges

Comments

abdelgha4 · Accepted Answer · 2023-05-18 12:49:40Z

Taking best of both @DSM and @John Vinyard solutions:

vectorizing dict.__getitem__ only for unique values.
mapping with numpy optimized indexing.

Code:

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}

>>> u, inv = np.unique(a, return_inverse=True)
>>> np.vectorize(d.get)(u)[inv].reshape(a.shape)
array([[11, 22, 33],
       [33, 22, 11]])

This has the same advantages of @DSM answer while also avoiding the python loop for unique elements in the array.

kho · Accepted Answer · 2024-08-06 21:59:36Z

Pandas has a map function for this purpose, which is faster than all other solutions published so far that generalise this problem, i.e. can map any value to another value as defined in a dictionary.

import numpy as np
import pandas as pd

def map_pandas(x, mapping_dict):
    return pd.Series(x.flatten()).map(mapping_dict).values.reshape(x.shape)

period = 6
x = np.random.randint(0, period, (5000,5000))
mapping = {i: i*11 for i in np.unique(x)}
map_pandas(x, mapping)

Speed comparison

Testing it against all proposed solutions with x.shape = (5000, 5000) and 6 mapping keys-value-pairs.

	Timing (mean ± std.)`*`
map_npi (Eelco Hoogendoorn)	1.04 s ± 10.7 ms
map_np_vectorize (DMS)	892 ms ± 2.25 ms
map_np_unique (John Vinyard)	802 ms ± 850 μs
map_np_unique_vectorize (abdelgha4)	799 ms ± 1.18 ms
map_np_iteritems (John Vinyard, Sergey Mikhaylin)	357 ms ± 1.8 ms
map_pandas (this solution)	94 ms ± 222 μs
non generalising functions below`**`
map_np_direct`**` (Maxim)	32.5 ms ± 426 μs`**`
map_np_direct2`**` (Mikhail V)	28.4 ms ± 528 μs`**`

* per function call out of 3 runs with 3 loops each
** non generalising functions, i.e. require an already indexed array of integers as input and therefore are only applicable to a minor number of cases.

Code for speed testing

import numpy as np
import pandas as pd
import numpy_indexed as npi

def map_npi(x, mapping_dict):
    return npi.remap(x.flatten(), list(mapping.keys()), list(mapping.values())).reshape(x.shape)

def map_np_unique(x, mapping_dict, default=np.nan):
    u, inv = np.unique(x, return_inverse = True)
    return np.array([mapping_dict.get(x, default) for x in u])[inv].reshape(x.shape)

def map_np_vectorize(x, mapping_dict):
    return np.vectorize(mapping.get)(x)

def map_np_unique_vectorize(x, mapping_dict):
    u, inv = np.unique(x, return_inverse = True)
    return np.vectorize(mapping.get)(u)[inv].reshape(x.shape)

def map_np_iteritems(x, mapping_dict):
    x_new = np.array(x)
    for k, v in mapping_dict.items():
        x_new[x == k] = v
    return x_new

def map_np_direct(x, mapping_dict):
    d_array = np.arange(x.max() + 1)
    d_array[list(mapping_dict.keys())] = list(mapping_dict.values())
    return d_array[x]

def map_np_direct2(x, mapping_dict):
    return np.array(list(mapping_dict.values()))[x]

period = 6
x = np.random.randint(0, period, (5000,5000))
# x = np.random.random((5000,5000)).round(1) # non generalising functions will fail
mapping = {i: i*11 for i in np.unique(x)}
result_probe = map_np_unique(x, mapping)
    
for f in [map_npi, map_np_vectorize, map_np_unique, map_np_unique_vectorize, 
          map_np_iteritems, map_pandas, map_np_direct, map_np_direct2]:
    print(f.__name__)
    try:
        assert (result_probe == f(x, mapping)).all()
    except AssertionError:
        print('Wrong result')
    except Exception as e:
        print(f'{e.__class__.__name__}: {e}')
    else:
        %timeit -n 3 -r 3 f(x, mapping)
    print()

Collectives™ on Stack Overflow

Translate every element in numpy array according to key

9 Answers 9

6 Comments

5 Comments

4 Comments

1 Comment

Comments

2 Comments

Comments

Comments

Speed comparison

Code for speed testing

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

6 Comments

5 Comments

4 Comments

1 Comment

Comments

2 Comments

Comments

Comments

Speed comparison

Code for speed testing

Comments

Linked

Related