5

Apologies if this has been answered elsewhere; I've tried searching, but haven't found anything that answers my question (or perhaps I have, but didn't understand it)...

I'm fairly new to Python (v2.6.2) and have a list of lists containing floating point values which looks something like the following (except the full thing has 2+ million entries for each list):

cat = [[152.123, 150.456, 151.789, ...], [4.123, 3.456, 1.789, ...], [20.123, 22.456, 21.789, ...]]

Now what I would like to do is sort all 3 of the lists by ascending order of the elements of the 3rd list, such that I get:

cat_sorted = [[152.123, 151.789, 150.456, ...], [4.123, 1.789, 3.456, ...], [20.123, 21.789, 22.456, ...]]

I've tried a few things, but they don't give me what I'm looking for (or perhaps I'm using them incorrectly). Is there a way to do what I am looking for and if so, what's the easiest & quickest (considering I have 3 x 2million entries)? Is there a way of sorting one list using another?

1
  • Just wondering what kind of problem it is and does python really suits this? I haven't seen any cases of using python for tasks with such amounts of data.. Commented Jan 4, 2013 at 17:38

6 Answers 6

8

This is going to be painful, but using default python you have 2 options:

  • decorate the 1st and 2nd lists with enumerate(), then sort these using the index to refer to values from the 3rd list:

    cat_sorted = [
        [e for i, e in sorted(enumerate(cat[0]), key=lambda p: cat[2][p[0]])],
        [e for i, e in sorted(enumerate(cat[1]), key=lambda p: cat[2][p[0]])],
        sorted(cat[2])
    ]
    

    although it may help to sort cat[2] in-place instead of using sorted(); you cannot get around using sorted() for the other two.

  • zip() the three lists together, then sort on the third element of this new list of lists, then zip() again to get back to the original structure:

    from operator import itemgetter
    cat_sorted = zip(*sorted(zip(*cat), key=itemgetter(2)))
    

Neither will be a performance buster, not with plain python lists of millions of numbers.

Sign up to request clarification or add additional context in comments.

2 Comments

Once I figured out what OP meant and how the description matched the sample input and output, my mind immediately jumped to the zip approach that you show. The problem description as given suggests that the data isn't really organized properly to begin with; zip hacks around that elegantly.
Brilliant. The second solution with the zip command works perfectly. Thanks for the help! :)
4

If you're willing to use an additional library, I suggest Python Pandas. It has a DataFrame object similar to R's data.frame and accepts a list of lists in the constructor, which will create a 3-column data array. Then you can easily use the built-in pandas.DataFrame.sort function to sort by the third column (ascending or descending).

There are many plain Python ways to do this, but given the size of your problem, using the optimized functions in Pandas is a better approach. And if you need any kind of aggregated statistics from your sorted data, then Pandas is a no-brainer for this.

1 Comment

+1 for using Pandas--that's what I was in the process of writing. The other answers are correct but for such large data sets a library like Pandas is what you really want.
2

The general approach I would take was to do a schwartzian transform on the whole thing.

Zip the three lists together into a list of tuples.

Sort the tuples using the third element as key.

iterate over the newly sorted list of tuples and fill in the three lists again.

Comments

1

For the sake of completion, a solution using numpy:

import numpy as np

cat = [[152.123, 150.456, 151.789],
        [4.123, 3.456, 1.789],
        [20.123, 22.456, 21.789]]

cat = np.array(cat) 
cat_sorted = cat[:, cat[2].argsort()]

print cat_sorted
[[ 152.123  151.789  150.456]
 [   4.123    1.789    3.456]
 [  20.123   21.789   22.456]]

Comments

0

Here is another way to do it based on the great answers by Martijn Pieters and pcalcao

def sort_by_last(ll):
    """
        >>> sort_by_last([[10, 20, 30], [3, 2, 1]])
        [[30, 20, 10], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [3, 2, 1]])
        [[30, 20, 10], [60, 50, 40], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 1, 1]])
        [[10, 20, 30], [40, 50, 60], [1, 1, 1]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 3, 1]])
        [[10, 30, 20], [40, 60, 50], [1, 1, 3]]

        >>> sort_by_last([[152.123, 150.456, 151.789], [4.123, 3.456, 1.789], [20.123, 22.456, 21.789]])
        [[152.123, 151.789, 150.456], [4.123, 1.789, 3.456], [20.123, 21.789, 22.456]]
    """
    return [sorted(x, key=lambda y: ll[-1][x.index(y)]) for x in ll]

The big string there is a docstring with doctest, to test the function copy it to a file and run it with python -m doctest -v <file>

1 Comment

The sting in here is the x.index() which will make the sort quite slow for large lists
0

Here, keys is a sorted list of indices.

keys = sorted(range(len(cat[2])), key=cat[2].__getitem__)
cat_sorted = [[cat[i][k] for k in keys] for i in range(3)]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.