1

Looking through the implementation of the Java HashMap here : http://www.docjar.com/html/api/java/util/HashMap.java.html I noticed the following :

The internal data structure used is an array which at each index stores the reference to the first entry in a linked list. The array index is based on the key's hashcode and the linked list represents the bucket for that particular hashcode. What I found interesting is the method indexFor(int h, int length) which, for a given key, determines what bucket in the array to look in. But the implementation, return h & (length - 1) looks odd in the sense that for an indeterminate number of hashcodes which do not coincide with a given array index the method will return 0. So, no matter what unique hashcode you implement for your object, the 0 bucket in the array will most likely be full of objects and thus you don't benefit from what a unique hashcode is supposed to offer you, that is faster data access.

Am I missing something?

Cristian

2 Answers 2

4

You are missing the following Javadoc from the HashMap source code:

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table;

This means that table.length-1 will always be a sequence of 1's.

Sign up to request clarification or add additional context in comments.

2 Comments

I have another question related to the hashtable class this time. In this implementation docjar.com/html/api/java/util/Hashtable.java.html, the bucket index in the array is calculated with key.hashcode % array.length. Here array.length is not a power of two. Lets say we have an intial array with length = 11 and we insert first a key with hashcode = 1 so, the array index would be 1 and then we insert another key with hashcode = 12 so the array index would also be 1. Wouldn't this mean that the objects for the 2 keys which have different hashcode will be stored in the same bucket?
Correct, though if you specify a good initial (i.e. 15) the resizing formula oldCapacity*2 + 1 will keep it good. Prime number is a typical recomendation in these cases and it would work reasonably well. If you specify crappy initial size, it will get better with the resizes. In the end, it is no perfect - no datastructure ever is. If you care that much about efficiency - I suggest you check: labs.carrotsearch.com/hppc.html - it gives you enough rope to do whatever you want :-)
0

I can't quite understand what you believe the problem to be.

The h & (length - 1) is a simple way to calculate h % n where n is a power of two. There isn't, in my understanding, any reason why h % n should give an unnaturally large number of zeros.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.