Java String hashcode caching

Question

One of the advantages of String immutability is hashcode caching for faster access.

In this case how cache is handled for string which has same hashcode?
Does it really improve the performance in this case?

"In this case how cache is handled for string which has same hashcode" - it caches at the level of string object itself, not some external cache — Petro Semeniuk
– Petro Semeniuk, Commented May 14, 2012 at 3:00

Stephen C · Accepted Answer · 2012-05-14 09:27:06Z

8

In this case how cache is handled for string which has same hashcode?

What is being cached is the hashcode of the string. It is cached in a private int field in the string itself. It doesn't make any difference that different Strings may have the same hashcode ... because the hashcode is stored in the respective String objects.

(The thing that matters most is that two strings that have the same sequence of characters (and hence are equal) have the same hashcode value. And that is guaranteed because the hashcode algorithm for Java strings is standardized ... and has this property.)

Does it really improve the performance in this case?

On average, yes, and more so as the string lengths get larger.

A decent string hashcode algorithm needs to look at every character in the string ... otherwise similar strings can end up systematically mapping to the same hashcode (and that is BAD). Avoiding lookiing at those N characters multiple times is a big win.

The only significant cases where caching wouldn't help would be:

when most string hashcodes are used once only, or
when most strings are really short.

(There's one more really obscure case. If a String hashes to 0, then caching will be ineffective. This is because the String class uses 0 in the cache field to say that the hashcode hasn't been cached.)

edited May 14, 2012 at 9:27

answered May 14, 2012 at 2:56

Stephen C

723k95 gold badges849 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user unknown Over a year ago

Minor note: Afaik, String equals first, after looking for the hashcode, looks for equal String length, and just then for each character.

Stephen C Over a year ago

That's not correct for Oracle Java 7, which doesn't use the hashcode value at all in equals(Object). I don't know about other / older versions of Java though ...

user unknown Over a year ago

Yes, that's right for Java-6 too: It doesn't use hashcode, but it checks a) reference identity, b) type identity, c) size, d) each character.

Stephen C Over a year ago

FWIW - the most likely reason that the cached hashcode value is not used is that most of the time a string's hashcode has not been calculated when you call equals ... and wouldn't be used subsequently. The Sun / Oracle Java guys do a LOT of code analysis and testing to decide what optimizations are actually going to effective in real life.

Theodore Murdock Over a year ago

Are you certain about that "hashes to zero" comment, that there is some small number of Strings whose hashcode must always be recalculated every time? I haven't looked at String's implementation, but some similar classes that I've looked at simply have code that says "Did it come out to be zero? Then the hashcode is 1, to avoid recalculating it every time." Unless your HashSet has close to Integer.MAX_VALUE buckets, it doesn't result in additional hash bucket collisions.

|

Hovercraft Full Of Eels · Accepted Answer · 2012-05-14 02:44:18Z

4

In this case how cache is handled for string which has same hashcode?

I don't understand the first part of your question. Cache is handled for all Strings the same, whether hashcodes are the same or not (since two different Strings can theoretically have the same hashCode, and so if hashCodes are equal, it doesn't mean that the Strings are equal). But if the same String object is used, the hashCode does not have to be recalculated since it is cached.

Does it really improve the performance?

Unequivocally YES

edited May 14, 2012 at 2:44

answered May 14, 2012 at 2:39

Hovercraft Full Of Eels

286k25 gold badges267 silver badges391 bronze badges

Comments

mikera · Accepted Answer · 2012-05-14 02:48:33Z

The cache is just an int field within the String object. Multiple Strings can have the same hashcode without issue.

It significantly helps performance because:

Calculating a hashcode is much more expensive than reading a single int field
If you calculate the hashcode of a string once, it is likely that you will want to calculate the hashcode of the String many more times (e.g. it is being used in a hashmap key)

If you are interested, worth taking a look at the source:

http://www.docjar.com/html/api/java/lang/String.java.html

Zdeněk Pavlas · Accepted Answer · 2016-02-11 08:34:05Z

1

In most cases, the hashCode is not calculated until you attempt to put the string to a HashMap. The map then caches it in Map.Entry anyway to speed up comparisons and rehashing.

edited Feb 11, 2016 at 8:34

answered Feb 10, 2016 at 22:02

Zdeněk Pavlas

3572 silver badges5 bronze badges

2 Comments

Sergey Ponomarev Over a year ago

So, in fact since almost all string aren’t keys in hashmap the cache field is useless and tooks addictional 8 Bytes per string instance

Zdeněk Pavlas Over a year ago

4 actually. Java hash codes are 32-bit.

wyp · Accepted Answer · 2012-05-14 02:44:11Z

-1

For the first one, it depends on your hash strategy. For example, if you add all ascii code of letters of a word together for this letter's hash code(65 for a and 97 for A), in this situation, the word "abc" and "bca" has the same hash code.

For the second one, it also depends on your hash strategy but in most situation, the answer is YES.

answered May 14, 2012 at 2:44

wyp

8793 gold badges11 silver badges24 bronze badges

5 Comments

Hovercraft Full Of Eels Over a year ago

Not sure how the second one "could depend". Regardless of the hashing strategy, if since caching means that it is only calculated once for a given String object, it is always faster.

wyp Over a year ago

"caching means that it is only calculated once", if so, I agree with you. English is not my mother language so...caching means only once?

user unknown Over a year ago

Well - the hash algorithm is implemented in the class String and not object to some strategy of Siva. It's fixed.

user207421 Over a year ago

@wyp I don't know what else caching would mean in this context.

poolie Over a year ago

-1, that's not how Java String.hashCode() works - ignoring rearrangements would be an obviously poor hash function.

Collectives™ on Stack Overflow

Java String hashcode caching

5 Answers 5

6 Comments

Comments

Comments

2 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

Comments

Comments

2 Comments

5 Comments

Linked

Related