Nerdy thought for the day : cache hash collisions

December 19, 2011 at 06:39 PM | categories: Code | View Comments

Saw this interesting StackOverflow question linked on reddit this morning.

I thought I had a bit of a handle on cache use after reading Using OpenMP but I must admit I had trouble understanding the answer. This great comment really bought it home for me:

Caches are hash tables, using very simple hashing. Each memory address is mapped to specific bucket. Something like bucket[address % 16];

Due to uniform hashing there exists a pathological worst case access pattern which ensures that cache collision occurs on every access, forcing CPU to either try to find a different bucket or to force a cache miss.

OS allocates memory in 4k chunks. When requesting a large allocation, arrays begin at a fresh page, so they are effectively aligned at same offset and same index always maps to same bucket in cache.

By allocating only a single array, same index maps to different buckets, since individual arrays are no longer aligned in same way.

Something to keep in mind - I often allocate several arrays of the same type and loop over them doing simple operations (entropy calculations etc.).

Update 22/12/11

For completeness (and my own reference) here are a couple more links with some nice explanations and details. This covers the basics, and this goes into much more detail about cache behaviour.

blog comments powered by Disqus