« No Silver Bullet Reloaded | Main | Cache-Oblivious Algorithms »

May 17, 2009


Nathanael Jones

Yeah, I've run into hash collisions on just 8,000 items in a .NET dictionary. That opened my eyes :)

Still trying to figure out why .NET has 32-bit hashes instead of 64....


i'm sure there are plenty of reasons. one is probably because java (and probably .net) don't require load, store, read, write to be atomic for 64 bit. this can create subtle problems if the programmer is not fully aware of the non-atomic treatment of double and long, and it'd probably force hash code stuff to be more expensive. and lets be fair, this was all done when 64 bit chips wasn't that available.

Ben Manes

An interesting aspect is to understand when a hash-table becomes a performance bottleneck due to the collision rate. At 200,000 entries there is a 99% probability of a collision. If using a CHM, which uses up to 2^16 segments, the average length of the link chain would be 3. This assumes uniform distribution, of course.

In a forum post I provided help on, the author had a map of 60M entries had a serious performance problem. My diagnosis was that this resulted in a length of 925 entries per chain. I suggested that he use a skip-list based map instead, which would have a height of 26. It seems that in the ideal case, the cross-over is at 1.3M entries.

So I agree with it being important to understand how a hash table works, because it will also let you know when not to use one.

The comments to this entry are closed.