3

How can I efficiently sort an array of objects on two or more attributes in Numpy?

class Obj():
    def __init__(self,a,b):
        self.a = a
        self.b = b

arr = np.array([],dtype=Obj)        

for i in range(10):
    arr = np.append(arr,Obj(i, 10-i))

arr_sort = np.sort(arr, order=a,b) ???

Thx, Willem-Jan

4
  • Does numpy support a class for the data type: np.array([],dtype=Obj)? Commented Jan 5, 2017 at 11:41
  • I'd use a list rather than object array. List append is faster. And list sort allows sorting key parameter. Commented Jan 5, 2017 at 13:10
  • Maybe you're looking for structured arrays. They don't work directly with Python classes though. Commented Jan 5, 2017 at 14:35
  • 1
    dtype=Obj is treated as dtype=object, the generic object dtype. Elements of such an array can be anything, including None. Commented Jan 5, 2017 at 18:12

1 Answer 1

1

The order parameter only applies to structured arrays:

In [383]: arr=np.zeros((10,),dtype='i,i')
In [385]: for i in range(10):
     ...:     arr[i] = (i,10-i)  
In [386]: arr
Out[386]: 
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])
In [387]: np.sort(arr, order=['f0','f1'])
Out[387]: 
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])
In [388]: np.sort(arr, order=['f1','f0'])
Out[388]: 
array([(9, 1), (8, 2), (7, 3), (6, 4), (5, 5), (4, 6), (3, 7), (2, 8),
       (1, 9), (0, 10)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

With a 2d array, lexsort provides a similar 'ordered' sort

In [402]: arr=np.column_stack((np.arange(10),10-np.arange(10)))
In [403]: np.lexsort((arr[:,1],arr[:,0]))
Out[403]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [404]: np.lexsort((arr[:,0],arr[:,1]))
Out[404]: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0], dtype=int32)

With your object array, I could extract the attributes into either of these structures:

In [407]: np.array([(a.a, a.b) for a in arr])
Out[407]: 
array([[ 0, 10],
       [ 1,  9],
       [ 2,  8],
      ....
       [ 7,  3],
       [ 8,  2],
       [ 9,  1]])
In [408]: np.array([(a.a, a.b) for a in arr],dtype='i,i')
Out[408]: 
array([(0, 10), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3),
       (8, 2), (9, 1)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

The Python sorted function will work on arr (or its list equivalent)

In [421]: arr
Out[421]: 
array([<__main__.Obj object at 0xb0f2d24c>,
       <__main__.Obj object at 0xb0f2dc0c>,
       ....
       <__main__.Obj object at 0xb0f35ecc>], dtype=object)
In [422]: sorted(arr, key=lambda a: (a.b,a.a))
Out[422]: 
[<__main__.Obj at 0xb0f35ecc>,
 <__main__.Obj at 0xb0f3570c>,
 ...
 <__main__.Obj at 0xb0f2dc0c>,
 <__main__.Obj at 0xb0f2d24c>]

Your Obj class is missing a nice __str__ method. I have to use something like [(i.a, i.b) for i in arr] to see the values of the arr elements.

As I stated in the comment, for this example, a list is much nice than an object array.

In [423]: alist=[]
In [424]: for i in range(10):
     ...:     alist.append(Obj(i,10-i))

list append is faster than the repeated array append. And object arrays don't add much functionality compared to a list, especially when 1d, and the objects are custom classes like this. You can't do any math on arr, and as you can see, sorting isn't any easier.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.