I want to maintain an ordered List<Integer> of size <= 10^6. Every time a new element will be added I will call Collections.sort() method to sort the new element in the list. As per my knowledge ArrayList is better performing than LinkedList. But since I will be calling sort() method quite often, I have come to understanding that linkedList will perform better when sorting the list and will be a better choice over ArrayList, since there is no shifting of elements as in case of ArrayList(uses array as underlying data structure). Any suggestions which will be more efficient.
5 Answers
You could use Collections#binarySearch on the sorted list to find the correct insertion point. ArrayList would probably perform better than a LinkedList, especially for larg-ish sizes, but that is easy to test.
I ran a micro benchmark of various methods: using a sort after each insertion or a binarySearch to insert in the right place, both with ArrayList (AL) and LinkedList (LL). I also added Commons TreeList and guava's TreeMultiset.
Conclusions
- the best algo among those tested is using
TreeMultiset, but it is not a list strictly speaking - the next best option is to use anArrayList+ binarySearch - ArrayList performs better than LinkedList in all situations and the latter took several minutes to complete with 100,000 elements (ArrayList took less than one second).
Code of the best performer, for reference:
@Benchmark public ArrayList<Integer> binarySearchAL() {
ArrayList<Integer> list = new ArrayList<> ();
Random r = new Random();
for (int i = 0; i < n; i++) {
int num = r.nextInt();
int index = Collections.binarySearch(list, num);
if (index >= 0) list.add(index, num);
else list.add(-index - 1, num);
current = list.get(0); //O(1), to make sure the sort is not optimised away
}
return list;
}
Full code on bitbucket.
Full results
The "Benchmark" column contains the name of the method under test (baseLine just fills a list without sorting it, the other methods have explicit names: AL=ArrayList, LL=LinkedList,TL=Commons TreeList,treeMultiSet=guava), (n) is the size of the list, Score is the time taken in milliseconds.
Benchmark (n) Mode Samples Score Error Units
c.a.p.SO28164665.baseLine 100 avgt 10 0.002 ± 0.000 ms/op
c.a.p.SO28164665.baseLine 1000 avgt 10 0.017 ± 0.001 ms/op
c.a.p.SO28164665.baseLine 5000 avgt 10 0.086 ± 0.002 ms/op
c.a.p.SO28164665.baseLine 10000 avgt 10 0.175 ± 0.007 ms/op
c.a.p.SO28164665.binarySearchAL 100 avgt 10 0.014 ± 0.001 ms/op
c.a.p.SO28164665.binarySearchAL 1000 avgt 10 0.226 ± 0.006 ms/op
c.a.p.SO28164665.binarySearchAL 5000 avgt 10 2.413 ± 0.125 ms/op
c.a.p.SO28164665.binarySearchAL 10000 avgt 10 8.478 ± 0.523 ms/op
c.a.p.SO28164665.binarySearchLL 100 avgt 10 0.031 ± 0.000 ms/op
c.a.p.SO28164665.binarySearchLL 1000 avgt 10 3.876 ± 0.100 ms/op
c.a.p.SO28164665.binarySearchLL 5000 avgt 10 263.717 ± 6.852 ms/op
c.a.p.SO28164665.binarySearchLL 10000 avgt 10 843.436 ± 33.265 ms/op
c.a.p.SO28164665.sortAL 100 avgt 10 0.051 ± 0.002 ms/op
c.a.p.SO28164665.sortAL 1000 avgt 10 3.381 ± 0.189 ms/op
c.a.p.SO28164665.sortAL 5000 avgt 10 118.882 ± 22.030 ms/op
c.a.p.SO28164665.sortAL 10000 avgt 10 511.668 ± 171.453 ms/op
c.a.p.SO28164665.sortLL 100 avgt 10 0.082 ± 0.002 ms/op
c.a.p.SO28164665.sortLL 1000 avgt 10 13.045 ± 0.460 ms/op
c.a.p.SO28164665.sortLL 5000 avgt 10 642.593 ± 188.044 ms/op
c.a.p.SO28164665.sortLL 10000 avgt 10 1182.698 ± 159.468 ms/op
c.a.p.SO28164665.binarySearchTL 100 avgt 10 0.056 ± 0.002 ms/op
c.a.p.SO28164665.binarySearchTL 1000 avgt 10 1.083 ± 0.052 ms/op
c.a.p.SO28164665.binarySearchTL 5000 avgt 10 8.246 ± 0.329 ms/op
c.a.p.SO28164665.binarySearchTL 10000 avgt 10 735.192 ± 56.071 ms/op
c.a.p.SO28164665.treeMultiSet 100 avgt 10 0.021 ± 0.001 ms/op
c.a.p.SO28164665.treeMultiSet 1000 avgt 10 0.288 ± 0.008 ms/op
c.a.p.SO28164665.treeMultiSet 5000 avgt 10 1.809 ± 0.061 ms/op
c.a.p.SO28164665.treeMultiSet 10000 avgt 10 4.283 ± 0.214 ms/op
For 100k items:
c.a.p.SO28164665.binarySearchAL 100000 avgt 6 890.585 ± 68.730 ms/op
c.a.p.SO28164665.treeMultiSet 100000 avgt 6 105.273 ± 9.309 ms/op
13 Comments
ArrayList, shifting after insertion would take time. A LinkedList would be more efficient in this case.TreeMultiSet really shows all that Guava optimisation when you hit large numbers of items! Nine times faster is nothing to sniff at.Since java does not have built in multiset, which is the perfect data structure for your situation, I will suggest using the TreeMultiset found in the guava library.
Multisets allow for duplicate elements, and a tree multiset will also add the benefit of keeping your collection sorted.
1 Comment
TreeMultiset is indeed the best option performance wise even if it not a list strictly speaking.Calling sort() on a LinkedList is devastating on the performance, due to the default implementation of List.sort() converting the List to an array for sorting. There are very few cases in which it makes sense to use a LinkedList, even though it may seem like it should be effective.
If you wish to have the collection always sorted, you really should use an ordered collection like a TreeSet or maybe even a PriorityQueue. It will provide cleaner code (as well as faster sorting), since you don't have to worry about calling sort() yourself all the time.
6 Comments
Set and PriorityQueue is only partially sorted, so that don't help my case either.Collections.sort() converts the list to an array and then sorts it. So effectively you will have O(nLogn) complexity.. and thats the best you could get.ArrayList is your most sensible bet. LinkedList performs really badly (like it does in almost every situation).O(n log n) time the same as array. That is what std::list::sort do in C++ anyway. Is LinkedList.sort not a specialized merge sort?Collections.sort is the get() method called. It only uses toArray() and listIterator().You should consider to use data structures that are designed to maintain order if the sorting is your main performance consideration.
Using the normal Java base classes you could use either of these:
PriorityQueue (in case you want to retain duplicates)
TreeSet (filter duplicates)
In any case it will be easiest to just prototype all versions and run some benchmarks + profiling.
Comments
Under Oracle Java / OpenJDK 7 or higher, the asymptotic performance of both would be similar. Collections.sort loads the list into an array, sorts the array, and loads the array back into the list by iterating through it (using a ListIterator), replacing its elements.
In both cases, this is an array sort on a mostly-sorted array (which is O(n) in OpenJDK 7 and higher, since it uses timsort), plus two list iterations (which are O(n) in both cases - although I'd expect LinkedList to have a worse constant term). So overall, it's an O(n) process, but likely to be slower for LinkedList.
If you're bulk inserting elements, the bulk insert will be O(n^2) overall, which is slower than inserting them all and sorting, or following Smac89's suggestion of using a TreeMultiset (both would be O(n log(n))).
And just for fun, here's a truly awful way to abuse TreeSet to allow it to store duplicate elements:
public class AwfulComparator<E extends Comparable<E>> implements Comparator<E> {
public int compare(E o1, E o2) {
int compared = o1.compareTo(o2);
return (compared == 0)?1:compared; // Never compare equal
}
}
new TreeSet<String>(new AwfulComparator<>());
4 Comments
Collections.sort doesn't merge sort when it encounters a LinkedList? Tested and considered slower than what it actually does using an array (especially with benefits of Timsort), or too much like hard work to implement?LinkedLists. It's probably not just to get the benefits of Timsort though - the code's been this way since OpenJDK 6, which used mergesort.Collections.sort implementation I've just found, also copies ArrayList into a new array before sorting, then copies back to the original. So this copying isn't an overhead just for collections that don't have random-access (as I first thought), it's to simplify everything to Arrays.sort. Therefore it's about equally worthwhile for all collection types.
Setbut something similar toBinary Search Treewould be great since it will perform better thanCollections.sort().But since I will be calling sort() method quite often, I have come to understanding that linkedList will perform better when sorting the list and will be a better choice over ArrayList, since there is no shifting of elements as in case of ArrayList(uses array as underlying data structure)., and ofc because, merge sorting a linked list is O(nlogn) operation if done right