HashMap Internal Implementation in Java

 Java’s HashMap is one of the most commonly used data structures, providing an efficient way to store and retrieve data. In this post, we will explore the internals of HashMap, how it works, the significance of methods like hashCode() and equals(), and what happens when we override or don't override them.

What is a HashMap?

HashMap in Java is a part of the Java Collections Framework, used for storing key-value pairs. It is an implementation of the Map interface, which means it provides methods to add, remove, and retrieve elements based on keys.

Interfaces Implemented by HashMap

  • Map<K,V>: The core interface for key-value mapping.
  • Clonable: Allows cloning of HashMap objects.
  • Serializable: Enables HashMap objects to be serialized, meaning they can be converted into a byte stream and saved to a file.

So, in short, HashMap is a part of the Java Collections Framework and implements several interfaces to provide efficient key-value storage and retrieval.

Key Concepts in HashMap

1. Hashing:

Hashing is the process of converting an object (key) into an integer (hash code). The hash code is then used to determine the index at which the key-value pair will be stored in an internal array (called a bucket).

2. hashCode():

The hash code is an integer generated by the hashCode() method of an object. It is used to place objects in a particular bucket in the HashMap. For example, if the key is a string “Name”, the hash code could be 234, and based on this, the object will be stored in bucket 234 (or at an index derived from this hash code).

3. equals():

The equals() method compares two objects for equality. In HashMap, this is crucial because after finding the correct bucket using the hash code, it uses equals() to ensure that the key being searched for is exactly the same as the key stored in that bucket.

How HashMap Works Internally

Internally, HashMap stores data in an array of buckets. Each bucket is essentially a list of nodes (also called entries) that store key-value pairs.

Here’s a breakdown of the process:

  1. Hashing: When a key is added, Java computes the hash code of that key using its hashCode() method.
  2. Index Calculation: The hash code is used to calculate the index in the array where the key-value pair should be stored.
  3. Storing in a Bucket: If the index is free, the entry is stored there. If another entry is already stored (collision), HashMap uses a linked list or tree to store the additional key-value pairs in the same bucket.
  4. Search: When retrieving a value, the hash code of the key is used to find the correct bucket, and the equals() method checks for the correct key within that bucket.

Node and Bucket Structure

To understand how HashMap works internally, let’s look at the structure of a bucket and a node:

  1. Bucket: Each index of the internal array in HashMap is called a bucket. Each bucket can store one or more key-value pairs if there are collisions.
  2. Node Structure: The nodes in a bucket contain:
  • hash: The hash code of the key.
  • key: The key object.
  • value: The value associated with the key.
  • next: A reference to the next node in the bucket (for handling collisions).

Node Structure:

static class Node<K, V> {
final int hash;
final K key;
V value;
Node<K, V> next;
}

When two keys produce the same hash code (a collision), HashMap creates a linked list or tree in that bucket. Let’s visualize this.

Bucket and Node Structure:

Index 0:  [key1 -> value1] -> [key2 -> value2] -> [key3 -> value3] (Linked List due to collisions)
Index 1: [key4 -> value4]
Index 2: null (Empty bucket)
Index 3: [key5 -> value5]
...
...

In the case of a collision at a particular index, HashMap chains the entries using the next pointer, forming a linked list (or binary tree in certain cases to optimize performance).

Detailed Example: How Insertion Works in HashMap

Let’s see an example where we insert key-value pairs into a HashMap.

Example Code:

HashMap<String, Integer> map = new HashMap<>();
map.put("Amit", 1);
map.put("John", 2);
map.put("Alice", 3);
  1. HashCode Generation: For each key, the hashCode() method is called. Suppose the hash code for “Amit” is 2154.
  2. Index CalculationHashMap computes the index using the formula:
index = hashCode % (size of the array)

For example, if the size of the array is 16, the index for “Amit” will be:

index = 2154 % 16 = 10

3. Storing in a Bucket: The key-value pair is stored at index 10. Similarly, “John” and “Alice” are stored at their calculated indexes.

Key    HashCode   Index   Value   Bucket

Amit 2154 10 1 [Amit, 1]
John 1940 4 2 [John, 2]
Alice 3301 5 3 [Alice, 3]

Collision Handling

When two keys have the same hash code and are mapped to the same index (collision), they are stored in the same bucket as a linked list or tree.

For example, suppose “Tom” also has a hash code that maps to index 10. This is a collision:

Key    HashCode   Index   Value   Bucket

Amit 2154 10 1 [Amit, 1] -> [Tom, 5]

Here, both “Amit” and “Tom” are stored in the same bucket, and a linked list is formed to store them.

Load Factor in HashMap

The load factor is a measure of how full the HashMap is allowed to get before its capacity is automatically increased. The default load factor is 0.75, which offers a good trade-off between time and space cost.

Understanding Load Factor

Load Factor = (Number of Entries) / (Number of Buckets)

A load factor of 0.75 means that when 75% of the HashMap is filled, it will resize (rehash) itself to accommodate more entries.

Example of Load Factor

  1. Initial Capacity: Let’s say the initial capacity of the HashMap is 16, and the load factor is 0.75.
  2. Threshold Calculation:
  • The threshold for resizing is calculated as:
threshold = initial capacity * load factor
threshold = 16 * 0.75 = 12
  1. Insertion Behavior:
  • As you add entries to the HashMap, once you reach the threshold of 12 entries, the HashMap will automatically resize itself, usually doubling the size of the internal array to 32 and rehashing all existing entries.

Resizing and Rehashing

When resizing occurs:

  • A new array is created with double the size of the previous one.
  • All existing entries are rehashed and moved to the new array based on their hash codes.

Example Code to Illustrate Load Factor:

HashMap<String, Integer> map = new HashMap<>(16, 0.75f);//Initial capacity of 16 and load factor of 0.75
for (int i = 1; i <= 12; i++) {
map.put("Key" + i, i);
}
System.out.println(map.size());//Output: 12

//Adding one more entry will trigger resizing
map.put("Key13", 13);
System.out.println(map.size()); //Output:13 (size increased and entries rehashed)

Why hashCode() and equals() Matter

HashMap relies on two important methods: hashCode() and equals().

1. hashCode():

  • The hashCode() method generates a unique integer for each object. This hash code determines the index where the key-value pair will be stored in the HashMap.

2. equals():

  • Even after finding the correct bucket using the hashCode(), the equals() method ensures that the correct key is located in case multiple keys map to the same index.

Overriding hashCode() and equals()

If you override both hashCode() and equals() correctly, you can ensure that your custom objects behave as expected when used as keys in a HashMap.

class Employee {
String name;
int id;

Employee(String name, int id) {
this.name = name;
this.id = id;
}

@Override
public int hashCode() {
return id;
}

@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null || getClass() != obj.getClass()) return false;
Employee other = (Employee) obj;
return this.id == other.id && this.name.equals(other.name);
}
}

Now, let’s create a HashMap with Employee as the key:

HashMap<Employee, String> map = new HashMap<>();
Employee emp1 = new Employee("Amit", 101);
Employee emp2 = new Employee("Amit", 101);

//Inserting into the HashMap
map.put(emp1, "Manager");

//Retrieving the value using a new Employee object with the same id and name
System.out.println(map.get(emp2)); //Output: Manager

Explanation:

  • hashCode(): We override hashCode() to use the id field for generating the hash code. Both emp1 and emp2 will have the same hashCode() because they have the same id.
  • equals(): We override equals() to compare both name and id. Since emp1 and emp2 have the same id and nameequals() returns true.

As a result, when map.get(emp2) is called, HashMap is able to find the correct bucket (using hashCode()), and the correct key (using equals()), returning “Manager”.

Not Overriding hashCode() and equals()

When you do not override hashCode() and equals(), the default implementations from the Object class are used, which may not give the desired behavior when custom objects are used as keys in a HashMap.

Let’s see what happens when we don’t override these methods:

class Employee {
String name;
int id;

Employee(String name, int id) {
this.name = name;
this.id = id;
}
}

Now, let’s create a HashMap with Employee as the key:

    HashMap<Employee, String> map = new HashMap<>();
Employee emp1 = new Employee("Amit", 101);
Employee emp2 = new Employee("Amit", 101);

//Inserting into the HashMap
map.put(emp1, "Manager");

//Retrieving the value using a new Employee object with the same id and name
System.out.println(map.get(emp2)); //Output: null

Explanation:

  • hashCode(): Since hashCode() is not overridden, the default implementation in Object is used, which is based on the memory address of the object. This means emp1 and emp2 will have different hash codes because they are two different instances.
  • equals(): The default equals() in Object checks if the two objects are the same instance. Since emp1 and emp2 are different objects (even though they have the same id and name), equals() will return false.

As a result, map.get(emp2) will return null because the HashMap cannot find emp2 in the same bucket as emp1. This happens because the hashCode() differs for the two objects, so they are treated as different entries.

Side-by-Side Comparison of Overriding vs Not Overriding

Let’s compare both scenarios side-by-side to highlight the differences:

Scenario     hashCode()      equals()      Behavior                 Output for
Overridden? Overridden? map.get(emp2)

Overriding Yes Yes Correctly finds the entry "Manager"
Both using both methods

Not No No Treats the two objects null
Overriding as different keys

Conclusion

When working with HashMap, understanding the role of hashCode() and equals() is crucial, especially when using custom objects as keys. If you don't override these methods correctly:

  • Your objects might not behave as expected.
  • Keys that are logically the same may be treated as different, leading to unexpected results (like null when retrieving values).

However, when both methods are overridden properly, HashMap can accurately store and retrieve your custom objects, ensuring that the key-value mapping works as intended.

If you found this post helpful, follow me for more content on Java, data structures, and other programming concepts!

Comments

Popular posts from this blog

Understanding the API Gateway Design Pattern

Understanding Service Discovery in Microservices

Understanding Executors and ExecutorService in Java

Understanding “try-with-resources” in Java