5

One of the advantages of String immutability is hashcode caching for faster access.

  • In this case how cache is handled for string which has same hashcode?
  • Does it really improve the performance in this case?
1
  • "In this case how cache is handled for string which has same hashcode" - it caches at the level of string object itself, not some external cache Commented May 14, 2012 at 3:00

5 Answers 5

8

In this case how cache is handled for string which has same hashcode?

What is being cached is the hashcode of the string. It is cached in a private int field in the string itself. It doesn't make any difference that different Strings may have the same hashcode ... because the hashcode is stored in the respective String objects.

(The thing that matters most is that two strings that have the same sequence of characters (and hence are equal) have the same hashcode value. And that is guaranteed because the hashcode algorithm for Java strings is standardized ... and has this property.)

Does it really improve the performance in this case?

On average, yes, and more so as the string lengths get larger.

A decent string hashcode algorithm needs to look at every character in the string ... otherwise similar strings can end up systematically mapping to the same hashcode (and that is BAD). Avoiding lookiing at those N characters multiple times is a big win.

The only significant cases where caching wouldn't help would be:

  • when most string hashcodes are used once only, or
  • when most strings are really short.

(There's one more really obscure case. If a String hashes to 0, then caching will be ineffective. This is because the String class uses 0 in the cache field to say that the hashcode hasn't been cached.)

Sign up to request clarification or add additional context in comments.

6 Comments

Minor note: Afaik, String equals first, after looking for the hashcode, looks for equal String length, and just then for each character.
That's not correct for Oracle Java 7, which doesn't use the hashcode value at all in equals(Object). I don't know about other / older versions of Java though ...
Yes, that's right for Java-6 too: It doesn't use hashcode, but it checks a) reference identity, b) type identity, c) size, d) each character.
FWIW - the most likely reason that the cached hashcode value is not used is that most of the time a string's hashcode has not been calculated when you call equals ... and wouldn't be used subsequently. The Sun / Oracle Java guys do a LOT of code analysis and testing to decide what optimizations are actually going to effective in real life.
Are you certain about that "hashes to zero" comment, that there is some small number of Strings whose hashcode must always be recalculated every time? I haven't looked at String's implementation, but some similar classes that I've looked at simply have code that says "Did it come out to be zero? Then the hashcode is 1, to avoid recalculating it every time." Unless your HashSet has close to Integer.MAX_VALUE buckets, it doesn't result in additional hash bucket collisions.
|
4

In this case how cache is handled for string which has same hashcode?

I don't understand the first part of your question. Cache is handled for all Strings the same, whether hashcodes are the same or not (since two different Strings can theoretically have the same hashCode, and so if hashCodes are equal, it doesn't mean that the Strings are equal). But if the same String object is used, the hashCode does not have to be recalculated since it is cached.

Does it really improve the performance?

Unequivocally YES

Comments

2

The cache is just an int field within the String object. Multiple Strings can have the same hashcode without issue.

It significantly helps performance because:

  • Calculating a hashcode is much more expensive than reading a single int field
  • If you calculate the hashcode of a string once, it is likely that you will want to calculate the hashcode of the String many more times (e.g. it is being used in a hashmap key)

If you are interested, worth taking a look at the source:

http://www.docjar.com/html/api/java/lang/String.java.html

Comments

1

In most cases, the hashCode is not calculated until you attempt to put the string to a HashMap. The map then caches it in Map.Entry anyway to speed up comparisons and rehashing.

2 Comments

So, in fact since almost all string aren’t keys in hashmap the cache field is useless and tooks addictional 8 Bytes per string instance
4 actually. Java hash codes are 32-bit.
-1

For the first one, it depends on your hash strategy. For example, if you add all ascii code of letters of a word together for this letter's hash code(65 for a and 97 for A), in this situation, the word "abc" and "bca" has the same hash code.

For the second one, it also depends on your hash strategy but in most situation, the answer is YES.

5 Comments

Not sure how the second one "could depend". Regardless of the hashing strategy, if since caching means that it is only calculated once for a given String object, it is always faster.
"caching means that it is only calculated once", if so, I agree with you. English is not my mother language so...caching means only once?
Well - the hash algorithm is implemented in the class String and not object to some strategy of Siva. It's fixed.
@wyp I don't know what else caching would mean in this context.
-1, that's not how Java String.hashCode() works - ignoring rearrangements would be an obviously poor hash function.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.