Two-Level (Perfect) Hash Tables in Practice

Question

For $n$ elements, a two-level hash table contains an $O(n)$ top-level hash table whose entries are hash tables also. Each table samples from a 2-universal family. To hash an element $x$, we first hash it to an entry (i.e., a bottom-level hash table) in the top-level hash table, then hash it to its destination in the bottom-level table. If we ensure each bottom-level table is quadratic in the number of items it contains, then the total space is $O(n)$ in expectation. Finally, we will impose a no-collision condition in the bottom-level tables; i.e., if a collision occurs, then we will randomly sample another hash function from the Hash family and rebuild the table.

If we are careful about when to double and shrink the table, then we can have expected amortized $O(1)$ inserts/deletes and $O(1)$ lookups. This seems good in theory for workloads that are heavily biased towards lookups, but how does its performance scale compared to a typical one-level hash table? Are there any use cases for this data structure in practice?

Smylic · Accepted Answer · 2024-10-17 19:23:41Z

There are different ways to resolve collision and each has its own pros and cons. For example separate chaining requires amortized $\mathrm O(1)$ of time to insert and delete element. Here both insert and delete are meant without lookup. Insert doesn't need preceding lookup if under some guarantees either the element is definitely new, or the number of repeated elements is neglectable and either no element is deleted, or hash table works as multiset, or delete may use lookup to delete all equal elements (that doesn't change the asymptotic time bound for the expected time of lookup + delete). Delete doesn't need preceding lookup if element is deleted by iterator or pointer. Also there are many cases when no guarantee stronger than expected $\mathrm O(1)$ of time is required for lookup. In these cases one-level hash table can be preferable because of smaller constant under $\mathrm O$. Another parameter is memory, which is guaranteed to be linear and also has smaller constant under $\mathrm O$.

So one-level hash table is better than two-level hash table in the following points:

has better guarantee about time of insert and delete operations (without lookup);
has better constant for lookup operation;
has better guarantee about memory consumption;
has better constant for memory consumption.

The constant in the lookup can be twice as bad: probing the bottom-level hash tables is likely to cause a cache/TLB miss, so it is going to be expensive. If the top-level hash table is also large, then probing it may also cause a cache miss. A single-level hash table with linear probing usually can often be probed with a single cache miss. — Charles Bouillaguet
– Charles Bouillaguet, Commented Oct 17, 2024 at 20:18

Stack Exchange Network

Two-Level (Perfect) Hash Tables in Practice

1 Answer 1

Hot Network Questions

Two-Level (Perfect) Hash Tables in Practice

1 Answer 1

Related

Hot Network Questions