13

I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.

On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.

5
  • 3
    Is your dataset sparse or dense? Commented Sep 9, 2013 at 20:57
  • 1
    HashSet is designed to have expected constant time add, contains and remove operations, meaning that the time won't change much regardless of how many elements are in the set. Arrays have linear operations for all of these, but lower overhead. This means that arrays will generally be better for small sets. I did some tests on my machine not long ago with an ArraySet implementation, and found that it was generally better up to about 150 elements to use the Array rather than Hash (but it depends a bit on the implementation and on the operations: Iterating was much faster for instance). Commented Sep 9, 2013 at 21:30
  • There are million opinions on this., javacodegeeks.com/2010/08/… and ibm.com/developerworks/library/j-jtp02183 Commented Sep 9, 2013 at 22:25
  • Depending on how many items you have EnumSet or something like it might be an option. Commented Sep 18, 2013 at 10:13
  • Take a look at stackoverflow.com/questions/10196343/… Commented Jul 7, 2014 at 7:41

4 Answers 4

24

Depends on your data;

HashSet gives you an O(1) contains() method but doesn't preserve order.

ArrayList contains() is O(n) but you can control the order of the entries.

Array if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set, you can directly use SortedSet which too has O(n) too but with flexible operations.

I believe Set is more flexible.

Sign up to request clarification or add additional context in comments.

2 Comments

But TreeSet (the implementation of SortedSet) is log(n) insertion/lookup...
@OliCharlesworth Tx. Was highlighting the point on flexibility on Sets than Array.
3

The choice greatly depends on what do you want to do with it.

If it is what mentioned in your question:

I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are

If this is what you need to do, the you need neither of them. There is a size() method in Collection for which you can get the size of it, which mean how many of them there are in the collection.

If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.

First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.

Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:

Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.

After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.

All the choice depends on what you would want to do with that collection of data. They performs differently on different action.

For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )

For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).

To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).

There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.

Comments

0

theoretically, and as SCJP6 Study guide says :D

arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)

if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)

2 Comments

Because if you need O(1) lookup (contains) you will need to write a lot of non trivial code. In which case the question becomes: why reinvent the wheel?
If suppose I need to store 5 string constants and parse the same in one of the loop, I think Arrays are more suitable as per above comment. Please let me know
0

It looks like you will want an HashMap that maps id's to counts. Particularly,

HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);

This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.

1 Comment

Or a HashSet, if the objects he uses have a hashCode method that returns their unique identifier. Note that this changes very little in practice, since HashSet uses an instance of HashMap internally...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.