3

I have searched around and tried to find a solution to what seems to be a simple problem, but have come up with nothing. The problem is to sort a matrix based on its columns, progressively. So, if I have a numpy matrix like:

import numpy as np
X=np.matrix([[0,0,1,2],[0,0,1,1],[0,0,0,4],[0,0,0,3],[0,1,2,5]])
print(X)
[[0 0 1 2]
 [0 0 1 1]
 [0 0 0 4]
 [0 0 0 3]
 [0 1 2 5]]

I would like to sort it based on the first column, then the second, the third, and so on, to get a result like:

Xsorted=np.matrix([[0,0,0,3],[0,0,0,4],[0,0,1,1],[0,0,1,2],[0,1,2,5]])
print(Xsorted)
[[0,0,0,3]
 [0,0,0,4]
 [0,0,1,1]
 [0,0,1,2]
 [0,1,2,5]]

While I think it is possible to sort a matrix like this by naming the columns and all that, I would prefer to have a method for sorting that doesn't depend so much on how big the matrix is. I am using Python 3.4, if that is important.

Any help would be greatly appreciated!

2
  • Do you have to use numpy? Commented Jan 13, 2016 at 15:27
  • I don't care about using numpy, but I would like to keep the results in matrix form if at all possible, or at least be able to readily translate results to matrix form. Commented Jan 13, 2016 at 15:28

2 Answers 2

2

It's not going to be particularly fast, but you can always convert your rows to tuples, then use Python's sort:

np.matrix(sorted(map(tuple, X.A)))

You can also use np.lexsort, as suggested in this answer to a somewhat related question:

X[np.lexsort(X.T[::-1])]

The lexsort approach appears to be faster, though you should test with your actual data to make sure:

In [20]: X = np.matrix(np.random.randint(10, size=(100,100)))

In [21]: %timeit np.matrix(sorted(map(tuple, X.A)))
100 loops, best of 3: 2.23 ms per loop

In [22]: %timeit X[np.lexsort(X.T[::-1])]
1000 loops, best of 3: 1.22 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

1

Here:

data = [[0,0,1,2],[0,0,1,1],[0,0,0,4],[0,0,0,3],[0,1,2,5]]
x  = pandas.DataFrame(data)
# order of columns to sort
z = x.sort([0,1,2,3])
output = z.as_matrix()

output:

array([[0, 0, 0, 3],
   [0, 0, 0, 4],
   [0, 0, 1, 1],
   [0, 0, 1, 2],
   [0, 1, 2, 5]])

2 Comments

Added some explanation and output.
Everyone --thanks very much for the help! It turns out that all of these solutions will be useful for me at some point!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.