Sorting arrays in NumPy by column

Question

How do I sort a NumPy array by its nth column?

For example, given:

a = array([[9, 2, 3],
           [4, 5, 6],
           [7, 0, 5]])

I want to sort the rows of a by the second column to obtain:

array([[7, 0, 5],
       [9, 2, 3],
       [4, 5, 6]])

Mateen Ulhaq · Accepted Answer · 2021-04-02 11:24:12Z

1018

To sort by the second column of a:

a[a[:, 1].argsort()]

edited Apr 2, 2021 at 11:24

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

answered May 13, 2010 at 15:39

Steve Tjoa

61.4k18 gold badges92 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Steven C. Howell Over a year ago

If you want the reverse sort, modify this to be a[a[:,1].argsort()[::-1]]

Václav Pavlík Over a year ago

Looks simple and works! Is it faster than np.sort or not?

poppie Over a year ago

I find this easier to read: ind = np.argsort( a[:,1] ); a = a[ind]

bean Over a year ago

a[a[:,k].argsort()] is the same as a[a[:,k].argsort(),:]. This generalizes to the other dimension (sort cols using a row): a[:,a[j,:].argsort()] (hope i typed that right.)

pippo1980 Over a year ago

needed to use b = a[a[:, 1].argsort()] then b is the sorted one

|

Trenton McKinney · Accepted Answer · 2020-06-16 00:22:50Z

182

@steve's answer is actually the most elegant way of doing it.

For the "correct" way see the order keyword argument of numpy.ndarray.sort

However, you'll need to view your array as an array with fields (a structured array).

The "correct" way is quite ugly if you didn't initially define your array with fields...

As a quick example, to sort it and return a copy:

In [1]: import numpy as np

In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])

In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

To sort it in-place:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None

In [7]: a
Out[7]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

@Steve's really is the most elegant way to do it, as far as I know...

The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].

edited Jun 16, 2020 at 0:22

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

answered May 13, 2010 at 16:10

Joe Kington

286k73 gold badges621 silver badges474 bronze badges

15 Comments

Clippit Over a year ago

In my numpy 1.6.1rc1, it raises ValueError: new type not compatible with array.

endolith Over a year ago

Would it make sense to file a feature request that the "correct" way be made less ugly?

Marco Over a year ago

What if the values in the array are float? Should I change anything?

ali_m Over a year ago

One major advantage of this method over Steve's is that it allows very large arrays to be sorted in place. For a sufficiently large array, the indices returned by np.argsort may themselve take up quite a lot of memory, and on top of that, indexing with an array will also generate a copy of the array that is being sorted.

evn Over a year ago

Can someone explain the 'i8,i8,i8'? This is for each column or each row? What should change if sorting a different dtype? How do I find out how many bits are being used? Thank you

|

J.J · Accepted Answer · 2017-02-25 22:37:00Z

62

You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:

a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]

This sorts by column 0, then 1, then 2.

edited Feb 25, 2017 at 22:37

answered Jul 5, 2016 at 1:42

J.J

3,6172 gold badges33 silver badges37 bronze badges

3 Comments

Little Bobby Tables Over a year ago

Why does First Sort not need to be stable?

J.J Over a year ago

Good question - stable means that when there's a tie you maintain the original order, and the original order of the unsorted file is irrelevant.

Clumsy cat Over a year ago

This seems like a really super important point. having a list that silently doesn’t sort would be bad.

prl900 · Accepted Answer · 2016-02-25 10:37:19Z

In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:

import numpy as np
table = np.random.rand(5000, 10)

%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop

%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop

import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop

So, it looks like indexing with argsort is the quickest method so far...

Peter Mortensen · Accepted Answer · 2017-05-26 10:00:55Z

26

From the NumPy mailing list, here's another solution:

>>> a
array([[1, 2],
       [0, 0],
       [1, 0],
       [0, 2],
       [2, 1],
       [1, 0],
       [1, 0],
       [0, 0],
       [1, 0],
      [2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
       [0, 0],
       [0, 2],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 2],
       [2, 1],
       [2, 2]])

edited May 26, 2017 at 10:00

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered Jun 3, 2015 at 15:03

fgregg

3,24133 silver badges37 bronze badges

1 Comment

Radio Controlled Over a year ago

The correct generalization is a[np.lexsort(a.T[cols])]. where cols=[1] in the original question.

Mateen Ulhaq · Accepted Answer · 2022-06-20 03:05:14Z

24

As the Python documentation wiki suggests:

a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]); 
a = sorted(a, key=lambda a_entry: a_entry[1]) 
print a

Output:

[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]

edited Jun 20, 2022 at 3:05

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

answered Sep 28, 2011 at 20:05

user541064

3332 silver badges7 bronze badges

4 Comments

Eric O. Lebigot Over a year ago

With this solution, one gets a list instead of a NumPy array, so this might not always be convenient (takes more memory, is probably slower, etc.).

Jivan Over a year ago

this "solution" is slower by the most-upvoted answer by a factor of ... well, close to infinity actually

Antony Hatchkins Over a year ago

@Jivan Actually, this solution is faster than the most-upvoted answer by a factor of 5 imgur.com/a/IbqtPBL

Kelly Bundy Over a year ago

@AntonyHatchkins But this doesn't do the whole job. Produces a list instead of an array. I get similar times as yours (3.02 ms vs 549 μs), but if I finish this by applying np.array to the result, it goes up to 4.3 ms.

Peter Mortensen · Accepted Answer · 2017-05-26 10:09:52Z

I had a similar problem.

My Problem:

I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors. My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.

So I want to sort a two-dimensional array column-wise by the first row in descending order.

My Solution

a = a[::, a[0,].argsort()[::-1]]

So how does this work?

a[0,] is just the first row I want to sort by.

Now I use argsort to get the order of indices.

I use [::-1] because I need descending order.

Lastly I use a[::, ...] to get a view with the columns in the right order.

David Buck · Accepted Answer · 2020-06-27 09:14:07Z

4

import numpy as np
a=np.array([[21,20,19,18,17],[16,15,14,13,12],[11,10,9,8,7],[6,5,4,3,2]])
y=np.argsort(a[:,2],kind='mergesort')# a[:,2]=[19,14,9,4]
a=a[y]
print(a)

Desired output is [[6,5,4,3,2],[11,10,9,8,7],[16,15,14,13,12],[21,20,19,18,17]]

note that argsort(numArray) returns the indices of an numArray as it was supposed to be arranged in a sorted manner.

example

x=np.array([8,1,5]) 
z=np.argsort(x) #[1,3,0] are the **indices of the predicted sorted array**
print(x[z]) #boolean indexing which sorts the array on basis of indices saved in z

answer would be [1,5,8]

edited Jun 27, 2020 at 9:14

David Buck

3,87840 gold badges54 silver badges73 bronze badges

answered Jun 27, 2020 at 8:54

ckaus

3533 silver badges12 bronze badges

1 Comment

adir abargil Over a year ago

You sure its not [1,2,0]?

hpaulj · Accepted Answer · 2016-08-07 16:33:59Z

A little more complicated lexsort example - descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort are that it sorts on rows (hence the .T), and gives priority to the last.

In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]: 
array([[1, 2, 1],
       [3, 1, 2],
       [1, 1, 3],
       [2, 3, 4],
       [3, 2, 5],
       [2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]: 
array([[3, 1, 2],
       [3, 2, 5],
       [2, 1, 6],
       [2, 3, 4],
       [1, 1, 3],
       [1, 2, 1]])

rubengavidia0x · Accepted Answer · 2022-03-04 23:09:08Z

Pandas Approach Just For Completeness:

a = np.array([[9, 2, 3],
              [4, 5, 6],
              [7, 0, 5]])              
a = pd.DataFrame(a) 

             
a.sort_values(1, ascending=True).to_numpy()
array([[7, 0, 5], # '1' means sort by second column
       [9, 2, 3],
       [4, 5, 6]])

prl900 Did the Benchmark, comparing with the accepted answer:

%timeit pandas_df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop

%timeit numpy_table[numpy_table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop

Sefa · Accepted Answer · 2018-01-30 19:36:58Z

Here is another solution considering all columns (more compact way of J.J's answer);

ar=np.array([[0, 0, 0, 1],
             [1, 0, 1, 0],
             [0, 1, 0, 0],
             [1, 0, 0, 1],
             [0, 0, 1, 0],
             [1, 1, 0, 0]])

Sort with lexsort,

ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]

Output:

array([[0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 1, 0, 0],
       [1, 0, 0, 1],
       [1, 0, 1, 0],
       [1, 1, 0, 0]])

Ehsan · Accepted Answer · 2020-04-27 04:59:01Z

0

It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:

np.einsum('ij->ij', a[a[:,1].argsort(),:])

This is an overkill for two dimensions and a[a[:,1].argsort()] would be enough per @steve's answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.

Output:

[[7 0 5]
 [9 2 3]
 [4 5 6]]

answered Apr 27, 2020 at 4:59

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

Comments

umair ali · Accepted Answer · 2020-08-15 08:45:00Z

0

#for sorting along column 1

indexofsort=np.argsort(dataset[:,0],axis=-1,kind='stable') 
dataset   = dataset[indexofsort,:]

answered Aug 15, 2020 at 8:45

umair ali

16 bronze badges

Comments

Arkady · Accepted Answer · 2021-01-31 14:58:57Z

0

def sort_np_array(x, column=None, flip=False):
    x = x[np.argsort(x[:, column])]
    if flip:
        x = np.flip(x, axis=0)
    return x

Array in the original question:

a = np.array([[9, 2, 3],
              [4, 5, 6],
              [7, 0, 5]])

The result of the sort_np_array function as expected by the author of the question:

sort_np_array(a, column=1, flip=False)

[2]: array([[7, 0, 5],
            [9, 2, 3],
            [4, 5, 6]])

edited Jan 31, 2021 at 14:58

answered Jan 30, 2021 at 20:53

Arkady

12 bronze badges

Comments

lhoupert · Accepted Answer · 2021-06-01 12:12:15Z

Thanks to this post: https://stackoverflow.com/a/5204280/13890678

I found a more "generic" answer using structured array. I think one advantage of this method is that the code is easier to read.

import numpy as np
a = np.array([[9, 2, 3],
           [4, 5, 6],
           [7, 0, 5]])

struct_a = np.core.records.fromarrays(
    a.transpose(), names="col1, col2, col3", formats="i8, i8, i8"
)
struct_a.sort(order="col2")

print(struct_a)

[(7, 0, 5) (9, 2, 3) (4, 5, 6)]

marc_s · Accepted Answer · 2022-03-05 08:17:05Z

0

Simply using sort, use column number based on which you want to sort.

a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a = a.tolist() 
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)

edited Mar 5, 2022 at 8:17

marc_s

759k185 gold badges1.4k silver badges1.5k bronze badges

answered Apr 19, 2020 at 17:13

Jerin Antony

4031 gold badge4 silver badges12 bronze badges

Collectives™ on Stack Overflow

Sorting arrays in NumPy by column

16 Answers 16

10 Comments

15 Comments

3 Comments

Comments

1 Comment

4 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

10 Comments

15 Comments

3 Comments

Comments

1 Comment

4 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related