The other day I asked an interviewee to implement a standard hash table. Pretty basic question, but you'd be surprised how many software engineers get stuck. I'm of the belief that if you're going to be using the HashMap class you should be somewhat familiar with its implementation. Wanting to not just "talk the talk" but "walk the walk" I figured I better look at how ConcurrentHashMap is implemented.
ConcurrentHashMap in Java 5 is your run of the mill hash table "supporting full concurrency of retrievals and adjustable expected concurrency for updates." Throughput scales well as the number of threads increase, much better than synchronized HashMap and Hashtable. Three things in ConcurrentHashMap caught my attention.
Lock Striping
Instead of using a single lock for the shared data, the shared data is partitioned with each partition having its own lock. Uncontended lock acquisition is very cheap; it's the contented locks that cause scalability issues. By Little's Law, the two factors influencing lock contention are: (1) how often the lock is requested; and (2) how long the lock is held.
With a different lock for each partition, ConcurrentHashMap effectively reduces how often a lock is requested by the number of partitions. You can think of ConcurrentHashMap has made up of n separate hash tables. The obvious challenge is that for some operations you now have to deal with n separate locks.
Use of Volatile Instead of Locks
Read operations in ConcurrentHashMap do not require lock acquisition (for the most part). With the improved Java Memory Model, ConcurrentHashMap makes use of volatile instead of synchronization and locking to ensure that the hash table is always in a consistent state. If you recall, a write to a volatile happens-before every subsequent read of that volatile (see Java Memory Model in 500 Words). Moreover, the visibility of other variables can piggyback on the volatile variable.
So if you look at the ConcurrentHashMap source you will see many comments on reading a volatile, because that volatile read allows the hash table to always be in a consistent state without the use of locks and synchronization.
V get(Object key, int hash) {
if (count != 0) { // read-volatile
HashEntrye = getFirst(hash);
while (e != null) {
if (e.hash == hash && key.equals(e.key)) {
V v = e.value;
if (v != null)
return v;
return readValueUnderLock(e); // recheck
}
e = e.next;
}
}
return null;
}
Power-of-Two Hash Table Size
Any data structures 101 book will say choose a prime for the number of buckets, so that the bucket's index can easily be computed by h(k) = k mod m, where k is the key value and m is the bucket size. While this approach is straight-forward, there are a number of issues with it, including slow modulo performance. ConcurrentHashMap instead uses a power-of-two rule for the number of buckets. This allows the use of bit masking to quickly determine the bucket.
final SegmentsegmentFor(int hash) {
return segments[(hash >>> segmentShift) & segmentMask];
}
Power-of-two sizing is not exclusive to ConcurrentHashMap and has been in place since Java 1.4. Incidentally, before the the bit masking is applied to the hash, an additional hash is applied to guard against bad hash functions.
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because ConcurrentHashMap uses power-of-two length hash tables,
* that otherwise encounter collisions for hashCodes that do not
* differ in lower or upper bits.
*/
private static int hash(int h) {
// Spread bits to regularize both segment and index locations,
// using variant of single-word Wang/Jenkins hash.
h += (h << 15) ^ 0xffffcd7d;
h ^= (h >>> 10);
h += (h << 3);
h ^= (h >>> 6);
h += (h << 2) + (h << 14);
return h ^ (h >>> 16);
}
Not sure why you would expect an engineer to write a hashmap in an interview... that's why it's part of the API. Understanding what it does and when to use it is far more valuable than blowing an interview because you don't know the internals of one. If I was interviewing for a JDK implementation job, I might expect that sort of question.. but if I am interviewing for a web dev or j2ee position, I use hashmap, I dont necessarily expect to know how to implement it from scratch, on the spot in a few minutes or so. I'd have told you what it does, and that would be about it.
Posted by: andjarnic | September 16, 2008 at 08:51 PM
i don't expect the interviewee to write a complete, perfect hash table class--just get the main points. while i agree that abstractions exists so you don't need to know the full implementation details, i think hash tables are so basic a data structure that anyone who calls themselves an "software engineer" should know the basics of it.
on a practical level, think of, for example, the List interface and the ArrayList and LinkedList implementations. i would want a software engineer to know the basic diff between the two bc certain operations are very inefficient in one and not the other. in order to know the diff, you have to know the basic structure of these two implementations.
Posted by: tinou | September 17, 2008 at 10:37 PM