Knitely

Posted on Jun 22

HashMap in Java

#java #datastructures #map

Introduction

Ever wonder how your computer keeps track of everything without losing its mind? Well, a big round of applause goes to Maps! In this post, we're going to peek behind the curtain and discover what makes them tick, why they're so incredibly handy, and how they manage to stay so organized, even when things get a little chaotic. We've distilled the wisdom of "Data Structures and Algorithms in Java™" down to its most digestible (and least clunky) form, just for you. So, read on, enjoy, and then tell us what you think – we're all ears!

Map

Let's talk about Maps, the ultimate VIP section for your data!
Imagine you have a super-exclusive club, and every guest (that's your value) needs a special, one-of-a-kind invitation (that's your key) to get in. You can't just waltz in; you need the key.

A Map is basically that club's bouncer and coat check, rolled into one super-efficient, digital package. It's an Abstract Data Type (ADT) designed to:

Efficiently store: Think of it as a meticulously organized closet where everything has its designated spot.
Retrieve values: Want your coat back? Just show your coat-check ticket (the key), and poof! Your coat (the value) is in your hands.
Based upon a uniquely identifying search key for each: This is the crucial part. Each key is like a fingerprint – no two are alike! If two keys were the same, how would the bouncer know which "value" to give you?

So, in essence, a Map just stores key-value pairs (k, v), which we lovingly call entries. 'k' is your secret handshake, your golden ticket, your "open sesame." And 'v' is the treasure that opens up.

It's like a super-smart dictionary, but instead of words and definitions, it's whatever awesome stuff you want to link together! Need to find your friend Bob's phone number? Just look up "Bob" (the key) and BAM! There's his number (the value).

The Map Interface

Let us define our interface for our map, fire up your favorite text editor, and let's craft the blueprint for our magnificent, dandy map. It's time to give it a personality and some rules.

Go ahead and create a file named Map.java, and then, write in the code I just gave you.

import java.util.Map.Entry;

public interface Map<K, V> {
    int size();

    boolean isEmpty();

    V get(K key);

    V put(K key, V value);

    V remove(K key);

    Iterable<K> keySet();

    Iterable<V> values();

    Iterable<Entry<K, V>> entrySet();
}

The `AbstractMap.java` class

Now our map M as been given a personality. We've established the ground rules for its existence. Then we have to give our attention to an important thing the sacred vows M must uphold, the operations it is obligated to perform.
Behold, the glorious operations where M gracefully execute:

keySet(): Returns an iterable collection containing all the keys stored in M.

values(): Returns an iterable collection containing all the values of entries stored in M (with repetition if multiple keys map to the same value).

size(): Returns the number of entries in M.

isEmpty(): Returns a boolean indicating whether M is empty

To make our map M real and useful, we must implement these operations, So go ahead and create AbstractMap.java file, An abstract class that implements the grumpy Map interface, but instead of doing all the work by itself, it strategically leaves some methods (shoving the responsibility onto its subclasses) while cunningly implementing others by relying on those methods.

Before we start our implementation I have to warn you of some things, keySet() and values() heavily depend on Iterator and Iterable interfaces which will not be discussed on this post for obvious reasons. So go ahead and do your research about them. And we will use Java’s Entry interface from java.util.Map.Entry for our MapEntery class.

Think of MapEntery as Node in linked list it is a simple class that holds the key and value (entry) of a map.

Go to AbstractMap.java and add the following code in it.

import java.util.Iterator;
import java.util.Map.Entry;

public abstract class AbstractMap<K, V> implements Map<K, V> {
    protected int size = 0;

    protected static class MapEntry<K, V> implements Entry<K, V> {
        private K key;
        private V value;

        public MapEntry(K key, V value) {
            this.key = key;
            this.value = value;
        }

        @Override
        public K getKey() {
            return key;
        }

        @Override
        public V getValue() {
            return value;
        }

        @Override
        public V setValue(V value) {
            V oldValue = this.value;
            this.value = value;
            return oldValue;
        }
    }

    // Implementation of Iterable for keys
    private class KeyIterate implements Iterator<K> {
        private Iterator<Entry<K, V>> entries = entrySet().iterator();

        @Override
        public boolean hasNext() {
            return entries.hasNext();
        }

        @Override
        public K next() {
            return entries.next().getKey();
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException();
        }
    }

    private class KeyIterable implements Iterable<K> {
        @Override
        public Iterator<K> iterator() {
            return new KeyIterate();
        }
    }

    @Override
    public Iterable<K> keySet() {
        return new KeyIterable();
    }

    // Implementation of Iterable for values
    private class ValueIterate implements Iterator<V> {
        private Iterator<Entry<K, V>> entries = entrySet().iterator();

        @Override
        public boolean hasNext() {
            return entries.hasNext();
        }

        @Override
        public V next() {
            return entries.next().getValue();
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException();
        }
    }

    private class ValueIterable implements Iterable<V> {
        @Override
        public Iterator<V> iterator() {
            return new ValueIterate();
        }
    }

    @Override
    public Iterable<V> values() {
        return new ValueIterable();
    };

    // Implementation of size and isEmpty methods
    @Override
    public int size() {
        return size;
    }

    @Override
    public boolean isEmpty() {
        return size == 0;
    }

}

The Hash Tables

Now let's talk about the Hash Table (also known as a Hash Map) – the superhero of Maps, but with a wonderfully weird superpower!

So, a Hash Table is basically a map M, but instead of just neatly filing things alphabetically or in some predictable order, it hires this Hash Function to:

Take your "key"
And, with a dramatic POOF and maybe a little wiggle, transform it into a corresponding indices in a table from 0 to N - 1, where N is length of the table look at the figure bellow a table length of 11. containing entries (1, D), (3, Z), (6, C), and (7, Q).

As a result, we will conceptualize our table as a bucket array, In which each bucket may manage a collection of entries that are sent to a specific index by the hash function.

Hash Function

The Hash Function (let's call it 'h'), The ultimate mission for h is simple yet crucial:

h's Job: To take any crazy, unique key k you throw at it h(k) and spit out a perfectly non-random-at-all-but-appears-random whole number.

This number must be between 0 and N-1. Think of N as the total number of perfectly sized, pre-labeled storage bins M has available. So, h's only goal is to turn your weird key into a valid bin number, ensuring your data has a designated, immediately accessible spot in one of M's N buckets.

If there are two or more keys with the same hash value, then two different entries will be mapped to the same bucket in M. In this case, we say that a the dreaded collision has occurred. A hash function is labeled “good” if it maps the keys in our map so as to sufficiently minimize the dreaded collisions.

Hash function, h(k) consisting of two portions a hash code that maps a key k to an integer, and a compression function that maps the hash code to an integer within a range of indices, [0, N −1].

Hash Codes

The first action that a hash function performs is to generate an integer i for a key k; this integer need not be in the range [0, N −1], and may even be negative shocking!! We desire that the set of hash codes assigned to our keys should avoid the dreaded collisions as much as possible.
These are the theories on how to build an efficient hash code (We are not going to use them in this post. Instead, we will use Java’s own hashCode() method to get our beloved hash codes).

Polynomial Hash Code.

Polynomial hash code uses multiplication by different powers as a way to spread out the influence of each component across the resulting hash code.

Basically, it uses multiplication and addition. For a given key k with a length of n, it follows this expression:

hash(k) = k[0] + k[1]×a *+ k[2]×a^2+⋯+k[*n − 1]×a ^(n − 1)

Here ‘a’, is non-zero integer that is different from 1, it is highly recommended for it to be any of this numbers 33, 37, 39 and 41.

Cyclic-Shift Hash Code.

Cyclic-Shift Hash Code is a type of hash code generation method that involves shifting the bytes of a key k in cyclic manner.

This process uses bitwise shifting, including both left and right cyclic rotations, where bits move from one end to the other. For example:

The cyclic rotation for binary number 1110 000 by 4-bit results in 0000 1110

An implementation of acyclic-shift hash code computation for a character string in Java appears as follows:

static int hashCode(String s) {
    int h = 0;
    for (int i = 0; i < s.length(); i++) {
        h = (h << 5) | (h >>> 27);
        h += (int) s.charAt(i);
    }
    return h;
}

Its is expected for equivalent keys to have the same hash code so that they are guaranteed to map to the same bucket.

Compression Function

So, our eccentric Hash Function h is a brilliant artist, spitting out "hash codes" – big, sometimes even negative, numbers.

The problem? Our Map M only has a limited number of neatly numbered buckets (from 0 to N-1). h's wild splatters are usually way outside that neat little range.

That's where the Compression Function steps in! It's like M's tidy-up crew. Their job is to take h's grand, but often unruly, number and expertly squish it, fold it, or just generally re-map it into one of those perfectly valid [0, N-1] bucket spots.

The Division Method

In division method we map key k hash code integer i into one of N slots by taking reminder of of N.

h(k) = i mode n

And here are the rules we have to follow for ‘good’ hash compression.

N should not be a power of 2, if N = 2^p, h(k) is just the lowest p order bits of k example.:

Let N = 16 = 2^4 (p = 4)

If k = 42 (0010 1010) 42 % 16 = 10 (1010), 1010 is the last 4-bit of (0010 1010)
The size of N should be a prime number not too close to an exact power of 2.

The MAD Method

The MAD Method is like a super-smart pattern-disruptor. It takes that incoming number i (the hash code integer) and gives it a sophisticated, mathematical shake-up to scramble any hidden patterns before assigning it a bucket, by turning into:

[ (a×i + b) mod p ] mod N

Where:

a and b are just some random, friendly numbers M picks to add a bit of chaos.
N is the total number of buckets.
p is a prime number larger than N.

So, the MAD method ensures those hash codes are really spread out, giving each key its own happy, less-collided-with spot in M's storage!

In this post we will use the MAD method as a compression function for our magnificent map.

Now lets implement the hash table with a hash code and hash compression function, well also implement basic hash map function like get , remove and put.

Alright, enough theory! M's ready to get its hands dirty. We're about to give our has map the key-smashing "hash code," the number-wrangling "compression function," and teaching it to perform its essential magic: get, put, and remove .

Go ahead and create a AbstractHashMap.java write the following code inside it.

import java.util.ArrayList;
import java.util.Random;
import java.util.Map.Entry;

import dsa.maps.AbstractMap;

public abstract class AbstractHashMap<K, V> extends AbstractMap<K, V> {

    protected int n = 0; // number of entries in the map
    protected int capacity; // size of the hash table
    private int prime; // prime factor for the hash function
    private long scale, shift; // scaling and shifting factor for the hash function

    public AbstractHashMap(int capacity, int prime) {
        this.capacity = capacity;
        this.prime = prime;
        Random rand = new Random();
        this.scale = rand.nextInt(prime - 1) + 1;
        this.shift = rand.nextInt(prime);
        createTable();
    }

    public AbstractHashMap(int capacity) {
        this(capacity, 109345121);
    }

    public AbstractHashMap() {
        this(17);
    }

    // Public methods
    @Override
    public int size() {
        return n;
    }

    @Override
    public V get(K key) {
        return bucketGet(hashValue(key), key);
    }

    @Override
    public V remove(K key) {
        return bucketRemove(hashValue(key), key);
    }

    @Override
    public V put(K key, V value) {
        V answer = bucketPut(hashValue(key), key, value);
        if (n > capacity / 2) { // load factor threshold make it bellow 0.5
            resize(2 * capacity - 1); // find the next prime number
        }

        return answer;
    }

    // Private utility method
    private int hashValue(K key) {
        int code = key.hashCode();
        return (int) ((Math.abs(code * scale + shift) % prime) % capacity);
    }

    private void resize(int newCapacity) {
        ArrayList<Entry<K, V>> buffer = new ArrayList<>(n);
        for (Entry<K, V> entry : entrySet()) {
            buffer.add(entry);
        }
        capacity = newCapacity;
        createTable();
        n = 0;
        for (Entry<K, V> entry : buffer) {
            put(entry.getKey(), entry.getValue());
        }

    }

    // Protected methods to be implemented by subclasses
    protected abstract void createTable();

    protected abstract V bucketGet(int h, K key);

    protected abstract V bucketPut(int h, K key, V value);

    protected abstract V bucketRemove(int h, K key);

}

Here, the hashValue(K key) method is a hash function. We have a resize(int newCapacity) method that resizes the table to the new capacity (a prime number). If you look closely, this method is called only when we pass a load factor threshold (we will discuss this in the next section).

We pass 17 a prime number as a capacity for the table.

Collision Handling

The Load Factor, it’s just M's way of checking how crowded its storage facility is! It's simply the number of items n M is currently holding divided by the total number of bins N M actually has.

Think of it as: "How full are M's shelves?" If it gets too high, M might start to feel a bit cramped and slow down, leading to more dreaded collisions! It is also used for indicating if a table needs to be resized.

Load factor (α) = n / N

So, we need to set a certain load factor threshold for our map to resize the hash table and rehash it, meaning reinserting all entries into the newly resized table. In our code above, we set our load factor threshold to 0.5 in order to avoid the dreaded collision again.

Separate Chaining

Instead of just one item per bucket, M says, "No problem! Each bucket A[j] will just get its own little sub-container – like a tiny, extra-organized backpack!"

So, if multiple keys hash to bucket j, they all happily chill together in that bucket's personal backpack. When M needs something from bucket j, it just checks that bucket's little sub-container. It's like having a shared locker, but everyone gets their own neatly organized drawer inside.

Now that we have gotten the theories of separate chaining out of the way, let's implement it! So, go create the ChainHashMap.java file and add the following code. Here, we will implement these methods AbstractHashMap delegated to it for implementation:

createTable(): This method should create an initially empty table having size equal to a designated capacity instance variable.

bucketGet(h, k): This method should mimic the semantics of the public get method, but for a key k that is known to hash to bucket h.

bucketPut(h, k, v): This method should mimic the semantics of the public put method, but for a key k that is known to hash to bucket h.

bucketRemove(h, k): This method should mimic the semantics of the public remove method, but for a key k known to hash to bucket h.

entrySet(): This standard map method iterates through all entries of the map.

We will implement these methods for ProbeHashMap class that will implement in the next section. Enough talk! go to ChainHashMap.java file and add this code.

package dsa.maps.hashMap;

import java.util.ArrayList;
import java.util.Map.Entry;

public class ChainHashMap<K, V> extends AbstractHashMap<K, V> {
    private ArrayList<MapEntry<K, V>>[] table;

    public ChainHashMap() {
        super();
    }

    public ChainHashMap(int capacity) {
        super(capacity);
    }

    public ChainHashMap(int capacity, int prime) {
        super(capacity, prime);
    }

    private int findIndex(K key, ArrayList<MapEntry<K, V>> bucket) {
        for (int i = 0; i < bucket.size(); i++) {
            if (key.equals(bucket.get(i).getKey())) {
                return i;
            }
        }
        return -1;
    }

    @Override
    public Iterable<Entry<K, V>> entrySet() {
        ArrayList<Entry<K, V>> entries = new ArrayList<>();
        for (ArrayList<MapEntry<K, V>> bucket : table) {
            if (bucket != null) {
                for (MapEntry<K, V> entry : bucket) {
                    entries.add(entry);
                }
            }
        }
        return entries;
    }

    @Override
    protected void createTable() {
        table = (ArrayList<MapEntry<K, V>>[]) new ArrayList[capacity];
    }

    @Override
    protected V bucketGet(int h, K key) {
        ArrayList<MapEntry<K, V>> bucket = table[h];
        int index = findIndex(key, bucket);
        if (index == -1) {
            return null;
        }

        return bucket.get(index).getValue();
    }

    @Override
    protected V bucketPut(int h, K key, V value) {
        ArrayList<MapEntry<K, V>> bucket = table[h];
        if (bucket == null) {
            bucket = new ArrayList<MapEntry<K, V>>();
            table[h] = bucket;
        }

        MapEntry<K, V> entry = new MapEntry<K, V>(key, value);
        bucket.add(entry);
        n++;
        return value;
    }

    @Override
    protected V bucketRemove(int h, K key) {
        ArrayList<MapEntry<K, V>> bucket = table[h];
        int index = findIndex(key, bucket);
        if (index == -1) {
            return null;
        }

        V temp = bucket.get(index).getValue();
        int size = bucket.size();
        if (index != size - 1) {
            bucket.set(index, bucket.get(size));
        }
        bucket.remove(size - 1);
        return temp;
    }

}

Our implementation uses a table, which is an array of ArrayList containing MapEntry<K, V> objects. Additionally, we use the findIndex(key, bucket) helper method to locate and return the index of a matched key within a specific bucket found at a given index in the table.

When an entry is removed from map M via bucketRemove(int h, K key), we first obtain an ArrayList L at a specific index derived from key k. We then find the index of key k within list L. Subsequently, we set the last element from list L to the index where k resided and remove the entry from that last index.

Open Addressing

Okay, so if "Separate Chaining" was like giving each bucket its own little backpack, "Open Addressing" is totally different!

Imagine M (our Map) has a very strict policy, Where every single item must live directly inside one of M's main buckets.

If M goes to put something in a bucket and finds it's already occupied (the dreaded collision!), it doesn't just make a new list. Oh no. It grumpily starts searching for the next available empty slot right there in the main table.

And when you ask M to get something? It zips to the expected bucket. If it's not there, M then systematically checks other nearby slots until it finds your item or declares, "Nope, not here!".

Open addressing requires that the load factor is always at most 1. For this post, we're going to dive headfirst into the Linear Probing method. Why? Because it's straightforward, gets the job done, and frankly, the others (Double Hashing, Quadratic Probing – meh!) are just too much effort for this discussion. Obvious reasons, you know. We like simplicity here!

Linear Probing

Imagine M tries to put a new item (k, v) into its assigned bucket A[j]. But oops! Someone's already sitting there.

Instead of panicking, M just says, "Excuse me, is A[(j+1) mod N] open?" (That's the very next bucket in line, wrapping around to the beginning if it hits the end).

If that one's also taken, M just keeps politely shuffling, one step at a time: "How about A[(j+2) mod N]? ...No? Okay, A[(j+3) mod N]?" And so on, until it finds an empty seat.

It's like searching for a parking spot in a crowded lot: you go to your assigned row, and if it's full, you just slowly inch forward, checking the very next spot until you find an open one! Simple, effective, and gets the job done.

But we face a serious issue when we remove an element from the table, Imagine you remove the car from spot #5 that is 5 (look at the above figure). Now that spot is empty. Later, M goes looking for car 16 which is on the spot #7, so it start probing from spot #4, But when M gets to the now-empty spot #5, it thinks, "Oh, no car here, must mean 16 isn't in the lot!" and gives up.

The solution? A special little "Ghost Car" or "Defunct Sentinel".

When M removes a car, it doesn't leave the spot truly empty. It puts a little "Ghost Car" (or a "Defunct" sign) there instead.

For get: M will politely skip over these Ghost Cars, knowing they're just placeholders, and keep searching until it finds the real car or a truly empty spot.
For put: If M encounters a Ghost Car while looking for a spot, it remembers, "Aha! This is a perfectly good, reusable spot for a new car!" and might just park there.

So, the "Defunct Sentinel" is M's clever way of making sure it never gets confused by a prematurely empty parking spot, ensuring smooth sailing (and finding!) even after removals.
Since we've finished all the preliminary discussions about linear probing, it's time for us to roll up our sleeves and get to the juicy part: implementing it! Go ahead and create the ProbeHashMap.java file, then add the following code.

package dsa.maps.hashMap;

import java.util.ArrayList;
import java.util.Map.Entry;

public class ProbeHashMap<K, V> extends AbstractHashMap<K, V> {
    private MapEntry<K, V>[] table;
    private MapEntry<K, V> DELETED = new MapEntry<>(null, null); // Marker for deleted entries

    public ProbeHashMap() {
        super();
    }

    public ProbeHashMap(int capacity) {
        super(capacity);
    }

    public ProbeHashMap(int capacity, int prime) {
        super(capacity, prime);
    }

    private boolean isAvailable(int i) {
        return table[i] == null || table[i] == DELETED;
    }

    private int findSlot(int h, K key) {
        int i = h;
        int available = -1;

        do {
            if (isAvailable(i)) {
                if (available == -1) {
                    available = i;
                }
                if (table[i] == null)
                    break;
            } else if (table[i].getKey().equals(key)) {
                return i; // Found the key
            }
            i = (i + 1) % capacity; // Linear probing cyclically
        } while (i != h);
        return -(available + 1);
    }

    @Override
    protected void createTable() {
        table = (MapEntry<K, V>[]) new MapEntry[capacity];
    }

    @Override
    protected V bucketGet(int h, K key) {
        int slot = findSlot(h, key);
        if (slot < 0) {
            return null;
        }
        return table[slot].getValue();
    }

    @Override
    protected V bucketPut(int h, K key, V value) {
        int slot = findSlot(h, key);
        if (slot >= 0)
            return table[slot].setValue(value);

        table[-(slot + 1)] = new MapEntry<>(key, value);
        n++;
        return value;
    }

    @Override
    protected V bucketRemove(int h, K key) {
        int slot = findSlot(h, key);
        if (slot < 0) {
            return null; // Key not found
        }
        V oldValue = table[slot].getValue();
        table[slot] = DELETED; // Mark as deleted
        n--;
        return oldValue;
    }

    @Override
    public Iterable<Entry<K, V>> entrySet() {
        ArrayList<Entry<K, V>> entries = new ArrayList<>();
        for (int i = 0; i < capacity; i++) {
            if (!isAvailable(i)) {
                entries.add(table[i]);
            }
        }

        return entries;
    }
}

We use the findSlot(int h, K key) method to probe the hash table cyclically, starting at the given index h. It returns a positive value when it finds an entry that matches k; otherwise, it returns a negative value.

Alright, my magnificent map-mastering comrades! We've journeyed through the wild realms of abstract data types, wrestled with grumpy interfaces, tamed eccentric hash functions, and even introduced a few "ghost cars" along the way. Our grand adventure with Maps in Java now reaches its glorious end.

So, open your Main.java file. Below, you'll find the code to call our Linear Probing warrior (the ProbeHashMap) This is where we see if code actually, you know, works.

Fair warning: We're only testing the Linear Probing part here. As for the Separate Chaining masterpiece? That, my dear, is your glorious homework assignment. M expects you to prove its chaining prowess yourself. After all, a true map explorer tests all their treasures!

 public static void main(String[] args) throws Exception {
      ProbeHashMap<String, Integer> myMap = new ProbeHashMap<>();
      myMap.put("January", 1);
      myMap.put("February", 2);
      myMap.put("March", 3);
      myMap.put("April", 4);
      myMap.put("May", 5);
      myMap.put("June", 6);
      myMap.put("July", 7);
      myMap.put("August", 8);
      myMap.put("September", 9);
      myMap.put("October", 10);
      myMap.put("November", 11);
      myMap.put("December", 13);

      System.out.println("March: " + myMap.get("March")); // 3
      System.out.println("Cherry: " + myMap.get("Cherry")); // null
      myMap.put("December", 12);
      System.out.println("December: " + myMap.get("December")); // 12

      myMap.remove("February");
      System.out.println("February: " + myMap.get("February")); // null
}

DEV Community

HashMap in Java

Introduction

Map

The Map Interface

The `AbstractMap.java` class

The Hash Tables

Hash Function

Hash Codes

Compression Function

Collision Handling

Separate Chaining

Open Addressing

Linear Probing

Top comments (0)

Introduction

Map

The Map Interface

The AbstractMap.java class

The Hash Tables

Hash Function

Hash Codes

Compression Function

Collision Handling

Separate Chaining

Open Addressing

Linear Probing

The `AbstractMap.java` class