1

Is there an easy way to access array elements by (string) key as well as by index? Suppose I have an array like this:

x = array([[0, 4, 9, 1],
           [1, 3, 9, 1],
           [3, 5, 6, 2],
           [6, 2, 7, 5]])

I am looking for way to specify a set of keys (for example ('A', 'C', 'G', 'T')) that can be used as an alias for an index. So x['A', 'C'], x[0,'C'], x['A', 1], and x[0,1] all return the value 4; x['G', :] is the same as x[2, :], and so on. I know that this can be achieved by subclassing a numpy array and overriding __getitem__ and __setitem__, but subclassing gets complicated very quickly, so I was wondering if there is a simpler or better way to do this.

3
  • Is the array always (4m4) with the same 4 keys? What numpy math are you doing? Any fancy indexing, or just this basic element access? Commented Jan 10, 2020 at 16:52
  • Simplest is to define a map/dictionary, dd = {'A':0, 'C':1, ...} and index with x[dd['A'],:]. I'd look at the numpy.lib.index_tricks.py` file to see classes that define their own indexing. Commented Jan 10, 2020 at 18:29
  • @hpaulj Different instances of the same array class may have different keys. However, the keys of any instance of the array class won't change during its lifetime. I expect that basic element access is sufficient. Commented Jan 11, 2020 at 16:15

1 Answer 1

1

You can use a pandas DataFrame:

import numpy as np
import pandas as pd

x = np.array([[0, 4, 9, 1],
              [1, 3, 9, 1],
              [3, 5, 6, 2],
              [6, 2, 7, 5]])
df = pd.DataFrame(x)


df.columns = df.index = ['A', 'C', 'G', 'T']

df

   A  C  G  T
A  0  4  9  1
C  1  3  9  1
G  3  5  6  2
T  6  2  7  5

df.loc['A', 'C'] # loc for location
4

or

df.iloc[0,1] # iloc for index location
4

However you cannot access elements using mixed syntax, namely df.loc[0, 'C'], df.iloc[0, 'C'], df.loc['A',1] and df.iloc['A', 1] will all raise an error.

Sign up to request clarification or add additional context in comments.

4 Comments

df.iloc[1,2] will do exactly what you've missed;)
Yes I forgot to add that, I meant that you cannot use mixed syntax as in the example. I will edit to clarify that.
Thank you. The thing is though that this is intended for an open-source software project, and we prefer to avoid adding a new dependency if there is a simple way of achieving the same behavior just with numpy.
@Michiel Pandas is a very well documented and mantained library, I see no drawbacks in using it for any opensource project. I looked for an other workaround, but I ended up with no proper solution to this problem

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.