Accessing numpy array elements by key

Question

Is there an easy way to access array elements by (string) key as well as by index? Suppose I have an array like this:

x = array([[0, 4, 9, 1],
           [1, 3, 9, 1],
           [3, 5, 6, 2],
           [6, 2, 7, 5]])

I am looking for way to specify a set of keys (for example ('A', 'C', 'G', 'T')) that can be used as an alias for an index. So x['A', 'C'], x[0,'C'], x['A', 1], and x[0,1] all return the value 4; x['G', :] is the same as x[2, :], and so on. I know that this can be achieved by subclassing a numpy array and overriding __getitem__ and __setitem__, but subclassing gets complicated very quickly, so I was wondering if there is a simpler or better way to do this.

Is the array always (4m4) with the same 4 keys? What numpy math are you doing? Any fancy indexing, or just this basic element access? — hpaulj
– hpaulj, Commented Jan 10, 2020 at 16:52
Simplest is to define a map/dictionary, dd = {'A':0, 'C':1, ...} and index with x[dd['A'],:]. I'd look at the numpy.lib.index_tricks.py` file to see classes that define their own indexing. — hpaulj
– hpaulj, Commented Jan 10, 2020 at 18:29
@hpaulj Different instances of the same array class may have different keys. However, the keys of any instance of the array class won't change during its lifetime. I expect that basic element access is sufficient. — Michiel
– Michiel, Commented Jan 11, 2020 at 16:15

FBruzzesi · Accepted Answer · 2020-01-10 16:22:15Z

1

You can use a pandas DataFrame:

import numpy as np
import pandas as pd

x = np.array([[0, 4, 9, 1],
              [1, 3, 9, 1],
              [3, 5, 6, 2],
              [6, 2, 7, 5]])
df = pd.DataFrame(x)


df.columns = df.index = ['A', 'C', 'G', 'T']

df

   A  C  G  T
A  0  4  9  1
C  1  3  9  1
G  3  5  6  2
T  6  2  7  5

df.loc['A', 'C'] # loc for location
4

or

df.iloc[0,1] # iloc for index location
4

However you cannot access elements using mixed syntax, namely df.loc[0, 'C'], df.iloc[0, 'C'], df.loc['A',1] and df.iloc['A', 1] will all raise an error.

edited Jan 10, 2020 at 16:22

answered Jan 10, 2020 at 15:45

FBruzzesi

6,6143 gold badges18 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Georgina Skibinski Over a year ago

df.iloc[1,2] will do exactly what you've missed;)

FBruzzesi Over a year ago

Yes I forgot to add that, I meant that you cannot use mixed syntax as in the example. I will edit to clarify that.

Michiel Over a year ago

Thank you. The thing is though that this is intended for an open-source software project, and we prefer to avoid adding a new dependency if there is a simple way of achieving the same behavior just with numpy.

FBruzzesi Over a year ago

@Michiel Pandas is a very well documented and mantained library, I see no drawbacks in using it for any opensource project. I looked for an other workaround, but I ended up with no proper solution to this problem

Collectives™ on Stack Overflow

Accessing numpy array elements by key

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related