Timeline for Hash map without collision check

Current License: CC BY-SA 3.0

9 events

when toggle format	what		by	license	comment
Mar 7, 2018 at 11:31	comment	added	Jules		@amon - in selecting hash functions and hash table size, it gives better collision resistance if the constant factors in the hash function and the size of the table are coprimes. An easy way of doing this is to choose a prime table size. Java chose the other way: it recommends the use of prime numbers as factors in the production of the hashes -- although this isn't enforced, and the default implementation of Object doesn't do this, most other types in the JDK with a defined hashCode method do.
Mar 7, 2018 at 11:14	comment	added	rwong		The problem that Berin Loritsch mentions is that, for user defined types, the hash functions that are implemented by a "typical / median" programmer are typically very bad. When this inadequacy is combined with a "nice" bucket count, it results in subtle performance bugs, sometimes causing unexpected failures. This is why it is better to pack all of the relevant fields from the user-defined type into an array, and use a well-written hash function on that array. Unfortunately this is widely perceived as too slow, and it might indeed be.
Mar 5, 2018 at 19:37	comment	added	amon		Btw the OpenJDK HashMap implementation enforces power of two table sizes, and Oracle documents the default capacity as the very non-prime 16
Mar 5, 2018 at 19:37	comment	added	amon		@BerinLoritsch You're not wrong, but this doesn't hold universally true either. Java, Python, … are unique because every object is hashable, often with a suboptimal hash function. There, a prime number leads to better bucket distribution. But with an evenly distributed hash function over a 2^n sized hash space, a power-of-two sized hash table and dynamic resizing by a factor of 2 is strictly superior. This guarantees amortized O(1) inserts and avoids memory fragmentation. If the bucket count doesn't divide the hash space evenly, you'll get a bias towards the lower-numbered buckets.
Mar 5, 2018 at 19:15	comment	added	Berin Loritsch		NOTE: when the number of buckets is an even power of two, you end up using far fewer buckets than you planned. A lot of that is due to the default implementation of getting a hashcode for an object in many languages (which is the memory address for the object). That's why most hashmap implementations use a prime number for the number of buckets. Trust me, lesson learned a long time ago when I created a custom concurrent safe hasmap in Java. Peer review and actual usage reports showed me the error of my ways.
Mar 5, 2018 at 17:47	comment	added	amon		@Jules oh, you're right, I misread that the question seems to be about using the hash as the key, not as the hash table's usual hash function (which can then be simplified to the identity function).
Mar 5, 2018 at 17:17	comment	added	Jules		The point of doing something like this is to use the hash as a proxy for the actual key, so if the key is large and/or expensive to access (e.g. because it's stored on disk while your hash index is stored in RAM) you can compare the hashes as a proxy for comparing the keys themselves. In this circumstance, storing the full hash instead of storing the key itself in the table is a useful structure. It's rare, but reasons to do it do exist.
Mar 4, 2018 at 23:53	history	edited	amon	CC BY-SA 3.0	I know my powers of two, I swear!
Mar 4, 2018 at 23:31	history	answered	amon	CC BY-SA 3.0