I have a large 2d array with hundreds of columns. I would like to sort it lexicographically, i.e. by first column, then by second column, and so on until the last column. I imagine this should be easy to do but I haven't been able to find a quick way to do this.
1 Answer
This is what numpy.lexsort is for, but the interface is awkward. Pass it a 2D array, and it will argsort the columns, sorting by the last row first, then the second-to-last row, continuing up to the first row:
>>> x
array([[0, 0, 0, 2, 3],
[2, 3, 2, 3, 2],
[3, 1, 3, 0, 0],
[3, 1, 1, 3, 1]])
>>> numpy.lexsort(x)
array([4, 1, 2, 3, 0], dtype=int64)
If you want to sort by rows, with the first column as the primary key, you need to rotate the array before lexsorting it:
>>> x[numpy.lexsort(numpy.rot90(x))]
array([[0, 0, 0, 2, 3],
[2, 3, 2, 3, 2],
[3, 1, 1, 3, 1],
[3, 1, 3, 0, 0]])
3 Comments
grigor
Great this seems to work! So then I need to do searchsorted in this but not sure how to. So given a 1d array I want to find out if if it's one of the 2d array's sorted rows. Any suggestions would be appreciated.
user66081
@grigor: maybe [all(row == t) for row in x]
MrArsGravis
One could add that there's a more time-efficient way of getting the same result as with
rot90, by using x[numpy.lexsort(x.T[::-1])]. According to timeit, this is about 25% faster than x[numpy.lexsort(numpy.rot90(x))] (tested for x.shape == (1000,5)).