9

A lot of people have said that ArrayList.removeAll is really slow with large size arrays.

This article provides two optimized solutions to the ArrayList.removeAll speed, but requires implementing them in the class itself, and cannot be used externally as a fix.

Is there any way to apply this sort of fix short of copying the ArrayList source code and using my own version of it?

Edit: I suppose I should add my need for this, as there is probably a way to do what I want without ArrayList.removeAll.

I have two lists of around 70,000 longs each. They are almost identical, but one list has a few more numbers that the second list doesn't have, and I want to find them. The only way I know to find them is to do first.removeAll(second) to find the difference. Is there another way?

3
  • you can submit a bugreport and hope it gets addressed before the next release Commented Jul 24, 2011 at 2:24
  • It has already been submitted as a bug, and will be fixed, but I need a fix that will work now, and in non-updated vms. Commented Jul 24, 2011 at 2:25
  • make your own LongArray then. this will even be more efficient then a ArrayList<Long> as you won't need to (un)box the values Commented Jul 24, 2011 at 2:36

2 Answers 2

9

What about using a data structure that has a much better removal time, such as HashSet or TreeSet? So the big reason to use an arraylist is due to the fast access time O(1) to access records. But if you are trying to set difference then maybe you should use sets. Just a thought.

Sign up to request clarification or add additional context in comments.

4 Comments

I guess I spent to much time trying to fix ArrayList rather than just using HashSet, which took no time at all to do what I wanted it to do. Thanks :D
It makes sense why ArrayList is slow. When you remove an item in the middle of an array, you need to copy all the values after it to fill in the gap.
@Sam Barnum Yes, but done right you need to copy each element at most once. Under the (quite realistic) assumption that lookup in the other collection is O(1), you get O(n) instead of O(n*n).
Sets if the passed collection is smaller than the current what does is check that every element of the set is in the passed collection. If the passed collection is a list that means iterating through it.
1

You can create a subclass of ArrayList to optimize that method (and possibly others).

2 Comments

I considered that, but the optimized methods have to have access to private fields, so I can't just subclass it.
I'm convinced that you could devise a reasonably efficient implementation using just existing ArrayList methods in your new optimized removeAll method (if you copy smartly). The performance problem is most probably that removeAll calls remove in a loop, resulting in potentially O(n^2) runtime

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.