Comparison of Maps in Java Collections Framework

March 26, 2015

Maps, Hash tables(in general sense), Dictionaries whatever you call them, in every well known programming language there are similar data structures. In java we have Maps as a part of the collections framework.
Although Maps do not implement Collection interface they are still one of the most important part of the framework. Java SE library provides several implementations of Map data structures. Of them, some of the commonly used are HashMap, LinkedHashMap, TreeMap, etc.

Note that the following diagram corresponds to the java 7 & 8 collections framework and it is an abstract view of the relevant classes and interfaces. AbstractMap class which is implemented by several concrete classes has been omitted.


Overview of Hash based maps 

Most of the map implementations in the collections framework are hash based. They are as follows,
  • Hashtable
  • HashMap
  • LinkedHashMap
  • ConcurrentHashMap
  • WeakHashMap
  • IdentityHashMap
The following diagram illustrates how these maps work internally in an abstract way.


The hash value act as a pointer to the bucket where key-value pairs are kept. Each bucket is unique. So it can be uniquely identified by the hash value. There can be pairs with the same hash value. This scenario is called the Hash Collision. Usually it's best to avoid the collision for reasons regarding efficiency, but this totally depends on the implementation of hashing function. When there are entries with similar hash value, they are kept in the form of a linked list in the bucket.

Typically all key-value pairs are stored as Entry objects, this can change depending on the implementation, for instance HashMap uses Node which is a sub class of Entry. Internally all of these Entry objects are maintained in an array. If you look at the source code of one of the map implementations you'll find and array named table of the type Entry, Node or Object.

To insert an element into a map, there should be a key and a value, that is of course an obvious fact and the objects used as keys must implement hashcode and equals methods properly.  When inserting, ie when put method is invoked,
  1. Hash code of the key object is compared with the existing hash codes.
  2. If there is NO hashcode similar present, an Entry objects with key-value pair is created that points to the hashcode.
  3. If a similar hashcode already exists, Entry objects of the relevant bucket are checked by invoking the equals method in key object.(Only the equality of the Key object is checked)
  4. If there is an equal key object, the corresponding value is replaced by the new value.
  5. Else a new Entry object is created with the key-value pair, and linked to the head of the list of Entries in the bucket.
When retrieving, ie when get method is invoked with a particular key, the procedure is
  1. Find the bucket that points to the hash code of the specified key.
  2. Compare the specified key object with the keys in the bucket by invoking equals method.
  3. If there is an equal key object, its corresponding value is returned, else null is returned.

1. Hashtable

This is the legacy map data structure which comes from the historic times of java and was reintroduced in java 1.2 as a member of collections framework. Now Hashtable is declared obsolete by the API mainly because of the performance issues caused by object level locks because of the synchronized method implementations.
The following quote is extracted from the standard API docs.
Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.

2. HashMap

This is the commonly used hash based map data structure which is roughly equivalent Hashtable but much efficient performance-vice. Some notable features as follows,
  1. get and put methods have an average time complexity of O(1) but this depends on hashcode and equals method implementations. If one hashcode points to multiple entries ie. the size of the bucket is greater than one, it should traverse the linked list of entries and invoke equals method. This might be costly. 
  2. HashMap implementation is NOT thread-safe. It can be explicitly synchronized to achieve the thread-safe state by invoking Collections.synchronizedMap().
  3. HashMap permits one null key and any number of null values. The null key is always stored in table[0] position in the Node table.
  4. HashMap entries are not guaranteed to be in any order, in fact it can change over the time.
  5. Performance of the HashMap can be adjusted by changing the initial capacity and the load factor (default load factor = 0.75).  When the product of the load factor and the current size of the table (ie. load factor*capacity) exceeds the number of Entry elements in the table, the table is reformed with approximately twice the capacity. If we can basically avoid constant reformation we can improve the performance significantly.

3. LinkedHashMap

LinkedHashMap inherits all the features from HashMap except it maintains the insertion order by using a doubly linked list on Entry objects.
Output:
HashMap insertion and retrieval order test
third=333
fourth=444
first=111
second=222

LinkedHashMap insertion and retrieval order test
first=111
second=222
third=333
fourth=444

4. ConcurrentHashMap

ConcurrentHashMap is functionally equivalent to Hashtable except performance-vice it is much more efficient in a concurrent environment. This is achieved mainly through removing the object level locks in Hashtable implementation and restricting the lock acquisition to small synchronized blocks. Some notable features as follows,
  1. No lock is acquired when reading data from the map.
  2. Locks are acquired when writing data ie. insert, delete, update by calling put, remove & clear methods.
  3. Unlike HashMap, ConcurrentHashMap does NOT allow null keys or null values to be inserted. If inserted NullPointerException is thrown.
  4. Re-sizing of the table is quite similar to HashMap. A custom initial capacity and a load factor can be  specified at the creation of the map.
  5. The concurrent access (writing) to multiple threads is granted through specifying a concurrency level. This functionality is used to lock separate portions of the map and grant access to them parallely. The default is typically 16. ie basically 16 threads can access the map concurrently. Note that this is just an estimated value and according Java 8 API there's no guarantee that this value will be used.

5. WeakHashMap

WeakHashMap is bit different from other map implementations. In java there are 4 different levels of references. They are strong, soft, weak & phantom references respectively ordered by strength. So as the name implies the WeakHashMap is associated with weak references, more precisely the keys are stored as weak references in the WeakHashMap.

The typical java references we create are strong references and the objects associated with these references stay in the memory as long as the reference is alive. Unlike strong references, the objects associated with weak references gets eligible for the garbage collection when they are not in ordinary use even if the reference is still alive. We can also create a weak reference explicitly by using java.lang.ref.WeakReference. In fact the API advices to insert value objects to WeakHashMap using WeakReference as a wrapper since the value objects might contain strong reference to the keys.
So what WeakHashMap does is discard the Entry objects when their keys are no longer in ordinary use. Other than the above mentioned property WeakHashMap has the same behavior as the HashMap and typically used as temporary caches.

6. IdentityHashMap

IdentityHashMap implementation breaks the equals contract and instead  '==' operator is used to compare key objects. That is the main difference between HashMap and IdentityHashMap. 
Output: 
IdentityHashMap: {key=222, key=111} 
HashMap: {key=222}

As observed from the output IdentityHashMap treats "key" and new String("key") as separate keys since their references are not equal where as the HashMap checks equality of key objects by invoking equals() method and since the content is equal in the two key objects, it replaces the first value with the second value.

Performance-vice IdentityHashMap is faster than the HashMap since it does not use equals() method. Other than above mentioned properties IdentityHashMap has the same behaviors as HashMap. The following quote about the usage of IdentityHashMap is extracted from the API docs.
A typical use of this class is topology-preserving object graph transformations, such as serialization or deep-copying. To perform such a transformation, a program must maintain a "node table" that keeps track of all the object references that have already been processed. The node table must not equate distinct objects even if they happen to be equal. Another typical use of this class is to maintain proxy objects. For example, a debugging facility might wish to maintain a proxy object for each object in the program being debugged.

NEXT: TreeMap

 References     [Oracle docs - collections framework]
   [Hashtable vs ConcurrentHashMap benchmark]
   [IBM - Java theory and practice: Building a better HashMap pdf]
   [ConcurrentHashMaps: avoid a common misuse]
   [StackOverFlow-When would you use a WeakHashMap or a WeakReference?]
   [Understanding weak references]
   [Difference between IdentityHashMap and HashMap in Java]

No comments:

Post a Comment