14

I want to maintain an ordered List<Integer> of size <= 10^6. Every time a new element will be added I will call Collections.sort() method to sort the new element in the list. As per my knowledge ArrayList is better performing than LinkedList. But since I will be calling sort() method quite often, I have come to understanding that linkedList will perform better when sorting the list and will be a better choice over ArrayList, since there is no shifting of elements as in case of ArrayList(uses array as underlying data structure). Any suggestions which will be more efficient.

6
  • You already answered your question, linkedlist is obviously more efficient. Alternative would be a binary tree like treeset... Commented Jan 27, 2015 at 6:45
  • @Smac89 I will have duplicate elements in my collection so can't go with a Set but something similar to Binary Search Tree would be great since it will perform better than Collections.sort(). Commented Jan 27, 2015 at 6:52
  • 3
    @Smac89 why do you think it is obviously more efficient? Commented Jan 27, 2015 at 6:53
  • @assylias Well OP did say...But since I will be calling sort() method quite often, I have come to understanding that linkedList will perform better when sorting the list and will be a better choice over ArrayList, since there is no shifting of elements as in case of ArrayList(uses array as underlying data structure)., and ofc because, merge sorting a linked list is O(nlogn) operation if done right Commented Jan 27, 2015 at 6:56
  • 2
    @Smac89 The results I get don't seem to agree with your statement. Also note that sorting a LinkedList in Java is slower than sorting an ArrayList, because the former implies an additional copy of the items to an array. Commented Jan 27, 2015 at 10:13

5 Answers 5

18

You could use Collections#binarySearch on the sorted list to find the correct insertion point. ArrayList would probably perform better than a LinkedList, especially for larg-ish sizes, but that is easy to test.

I ran a micro benchmark of various methods: using a sort after each insertion or a binarySearch to insert in the right place, both with ArrayList (AL) and LinkedList (LL). I also added Commons TreeList and guava's TreeMultiset.

Conclusions

  • the best algo among those tested is using TreeMultiset, but it is not a list strictly speaking - the next best option is to use an ArrayList + binarySearch
  • ArrayList performs better than LinkedList in all situations and the latter took several minutes to complete with 100,000 elements (ArrayList took less than one second).

Code of the best performer, for reference:

@Benchmark public ArrayList<Integer> binarySearchAL() {
  ArrayList<Integer> list = new ArrayList<> ();

  Random r = new Random();
  for (int i = 0; i < n; i++) {
    int num = r.nextInt();
    int index = Collections.binarySearch(list, num);
    if (index >= 0) list.add(index, num);
    else list.add(-index - 1, num);
    current = list.get(0); //O(1), to make sure the sort is not optimised away
  }
  return list;
}

Full code on bitbucket.

Full results

The "Benchmark" column contains the name of the method under test (baseLine just fills a list without sorting it, the other methods have explicit names: AL=ArrayList, LL=LinkedList,TL=Commons TreeList,treeMultiSet=guava), (n) is the size of the list, Score is the time taken in milliseconds.

Benchmark                            (n)  Mode  Samples     Score     Error  Units
c.a.p.SO28164665.baseLine            100  avgt       10     0.002 ±   0.000  ms/op
c.a.p.SO28164665.baseLine           1000  avgt       10     0.017 ±   0.001  ms/op
c.a.p.SO28164665.baseLine           5000  avgt       10     0.086 ±   0.002  ms/op
c.a.p.SO28164665.baseLine          10000  avgt       10     0.175 ±   0.007  ms/op
c.a.p.SO28164665.binarySearchAL      100  avgt       10     0.014 ±   0.001  ms/op
c.a.p.SO28164665.binarySearchAL     1000  avgt       10     0.226 ±   0.006  ms/op
c.a.p.SO28164665.binarySearchAL     5000  avgt       10     2.413 ±   0.125  ms/op
c.a.p.SO28164665.binarySearchAL    10000  avgt       10     8.478 ±   0.523  ms/op
c.a.p.SO28164665.binarySearchLL      100  avgt       10     0.031 ±   0.000  ms/op
c.a.p.SO28164665.binarySearchLL     1000  avgt       10     3.876 ±   0.100  ms/op
c.a.p.SO28164665.binarySearchLL     5000  avgt       10   263.717 ±   6.852  ms/op
c.a.p.SO28164665.binarySearchLL    10000  avgt       10   843.436 ±  33.265  ms/op
c.a.p.SO28164665.sortAL              100  avgt       10     0.051 ±   0.002  ms/op
c.a.p.SO28164665.sortAL             1000  avgt       10     3.381 ±   0.189  ms/op
c.a.p.SO28164665.sortAL             5000  avgt       10   118.882 ±  22.030  ms/op
c.a.p.SO28164665.sortAL            10000  avgt       10   511.668 ± 171.453  ms/op
c.a.p.SO28164665.sortLL              100  avgt       10     0.082 ±   0.002  ms/op
c.a.p.SO28164665.sortLL             1000  avgt       10    13.045 ±   0.460  ms/op
c.a.p.SO28164665.sortLL             5000  avgt       10   642.593 ± 188.044  ms/op
c.a.p.SO28164665.sortLL            10000  avgt       10  1182.698 ± 159.468  ms/op
c.a.p.SO28164665.binarySearchTL      100  avgt       10    0.056 ±  0.002  ms/op
c.a.p.SO28164665.binarySearchTL     1000  avgt       10    1.083 ±  0.052  ms/op
c.a.p.SO28164665.binarySearchTL     5000  avgt       10    8.246 ±  0.329  ms/op
c.a.p.SO28164665.binarySearchTL    10000  avgt       10  735.192 ± 56.071  ms/op
c.a.p.SO28164665.treeMultiSet        100  avgt       10    0.021 ±  0.001  ms/op
c.a.p.SO28164665.treeMultiSet       1000  avgt       10    0.288 ±  0.008  ms/op
c.a.p.SO28164665.treeMultiSet       5000  avgt       10    1.809 ±  0.061  ms/op
c.a.p.SO28164665.treeMultiSet      10000  avgt       10    4.283 ±  0.214  ms/op

For 100k items:

c.a.p.SO28164665.binarySearchAL    100000  avgt        6  890.585 ± 68.730  ms/op
c.a.p.SO28164665.treeMultiSet      100000  avgt        6  105.273 ±  9.309  ms/op
Sign up to request clarification or add additional context in comments.

13 Comments

+1 for binary search :).but if the OP uses an ArrayList, shifting after insertion would take time. A LinkedList would be more efficient in this case.
I don't know, you may be right. binarySearch is probably slower on LinkedList but insertion is faster...
@TheLostMind Just ran some tests, LinkedList performance is appalling for larg-ish samples (several minutes for 100k items), For smaller sizes (< 1000) it performs ok in absolute terms but still slower than ArrayList.
@TheLostMind It doesn't take so much time once you have jmh setup - probably 15 mins to write the code and then the test runs in background for 10 mins or so. Admittedly the first time I used it it took me a lot more than 15 mins to write my first real life test :-)
Wow, TreeMultiSet really shows all that Guava optimisation when you hit large numbers of items! Nine times faster is nothing to sniff at.
|
6

Since java does not have built in multiset, which is the perfect data structure for your situation, I will suggest using the TreeMultiset found in the guava library.

Multisets allow for duplicate elements, and a tree multiset will also add the benefit of keeping your collection sorted.

1 Comment

TreeMultiset is indeed the best option performance wise even if it not a list strictly speaking.
2

Calling sort() on a LinkedList is devastating on the performance, due to the default implementation of List.sort() converting the List to an array for sorting. There are very few cases in which it makes sense to use a LinkedList, even though it may seem like it should be effective.

If you wish to have the collection always sorted, you really should use an ordered collection like a TreeSet or maybe even a PriorityQueue. It will provide cleaner code (as well as faster sorting), since you don't have to worry about calling sort() yourself all the time.

6 Comments

I am expecting duplicate elements in my list so can't go with Set and PriorityQueue is only partially sorted, so that don't help my case either.
Collections.sort() converts the list to an array and then sorts it. So effectively you will have O(nLogn) complexity.. and thats the best you could get.
@MeenaChaudhary Then ArrayList is your most sensible bet. LinkedList performs really badly (like it does in almost every situation).
Merge sort doesn't require random access and can work on linked list in O(n log n) time the same as array. That is what std::list::sort do in C++ anyway. Is LinkedList.sort not a specialized merge sort?
On OpenJDK, at no point during the execution of Collections.sort is the get() method called. It only uses toArray() and listIterator().
|
1

You should consider to use data structures that are designed to maintain order if the sorting is your main performance consideration.

Using the normal Java base classes you could use either of these:

PriorityQueue (in case you want to retain duplicates)
TreeSet (filter duplicates)

In any case it will be easiest to just prototype all versions and run some benchmarks + profiling.

Comments

1

Under Oracle Java / OpenJDK 7 or higher, the asymptotic performance of both would be similar. Collections.sort loads the list into an array, sorts the array, and loads the array back into the list by iterating through it (using a ListIterator), replacing its elements.

In both cases, this is an array sort on a mostly-sorted array (which is O(n) in OpenJDK 7 and higher, since it uses timsort), plus two list iterations (which are O(n) in both cases - although I'd expect LinkedList to have a worse constant term). So overall, it's an O(n) process, but likely to be slower for LinkedList.

If you're bulk inserting elements, the bulk insert will be O(n^2) overall, which is slower than inserting them all and sorting, or following Smac89's suggestion of using a TreeMultiset (both would be O(n log(n))).

And just for fun, here's a truly awful way to abuse TreeSet to allow it to store duplicate elements:

public class AwfulComparator<E extends Comparable<E>> implements Comparator<E> {
    public int compare(E o1, E o2) {
        int compared = o1.compareTo(o2);
        return (compared == 0)?1:compared; // Never compare equal
    }
}

new TreeSet<String>(new AwfulComparator<>());

4 Comments

I'm curious, do you know the rationale why Collections.sort doesn't merge sort when it encounters a LinkedList? Tested and considered slower than what it actually does using an array (especially with benefits of Timsort), or too much like hard work to implement?
@SteveJessop My guess is that they wanted to keep the code clean - that kind of "if instanceof" test is a code smell. I'm also not sure if it would be much faster. For mergesort to work, you need to partition the list into 2 sublists first, and that has overhead with LinkedLists. It's probably not just to get the benefits of Timsort though - the code's been this way since OpenJDK 6, which used mergesort.
"that has overhead with LinkedLists" -- well, not if you pull the guts out of your LinkedList implementation. You could for example work from both ends towards the middle, you don't have to iterate to the middle first. But a secret native method somewhere that knows how to manipulate LinkedList nodes directly might reasonably be considered cheating. I was just wondering if Sun/Oracle ever said anything about it, since there's more than one plausible reason.
Oh, I see. One Collections.sort implementation I've just found, also copies ArrayList into a new array before sorting, then copies back to the original. So this copying isn't an overhead just for collections that don't have random-access (as I first thought), it's to simplify everything to Arrays.sort. Therefore it's about equally worthwhile for all collection types.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.