The other day I was chatting with a colleague about Memcached. Eviction policy came up, and I casually mentioned that Memcache isn't strictly LRU. But a quick Bing search said Memcache is LRU, like this Wikipedia entry. Hmm, I was 99.9% sure Memcache is not LRU, something to do with how it manages memory, but maybe I was wrong all these years. After reading through some Danga mailing lists and documentation, the answer is, Memcached is LRU per slab class, but not globally LRU.
So what does this exactly mean. Imagine starting Memcached with these command-line arguments to override the defaults,
memcached -vv -m 1 -I 1k -f 3 -M
-vv
for verbose output-m 1
maximum 1 megabyte cache-I 1k
1 kilobyte (1024 byte) page size-f 3
chunk size growth factor of 3-M
return error on memory exhausted (rather than removing items)
just to see things more explicitly
The output will be something like this,
slab class 1: chunk size 96 perslab 10
slab class 2: chunk size 288 perslab 3
slab class 3: chunk size 1024 perslab 1
<26 server listening (auto-negotiate)
<27 server listening (auto-negotiate)
<28 send buffer was 9216, now 5592405
<29 send buffer was 9216, now 5592405
<28 server listening (udp)
<28 server listening (udp)
<28 server listening (udp)
<28 server listening (udp)
<29 server listening (udp)
<29 server listening (udp)
<29 server listening (udp)
<29 server listening (udp)
The visual representation below may be helpful,
We have a cache with a max memory limit of 1 megabyte. This 1 megabyte of memory is divided into 1 kilobyte (1024 byte) pages. These 1024 byte pages are assigned, as needed, on a first come, first serve basis, to the three slab classes. Page assignment to a slab class is permanent. The three slab classes are determined by the page size and the chunk size growth factor. If you change the page size and/or chunk size growth factor you'll get a different slab class configuration.
With our settings, slab 1 has a chunk size of 96 bytes (the smallest, initial chunk size). There are 10 chunks per page (11 chunks would be more than the 1024 byte page size) in slab 1. Slab 2 has a chunk size of 288 (96 x 3 growth factor). There are only 3 chunks per page in slab 2. Finally, slab 3 has a chunk size equal to the page size and obviously only 1 chunk per page.
Each chunk can hold an item up to its size. By item, I mean the key, the value and some overhead. The item's size determines which slab it is stored in. Small items, say 40 or 60 bytes, are put in slab 1. Leftover storage is wasted. That is, putting a 60 byte item in a 96 byte chunk results in 36 byte wasted space. These 36 bytes are not available for other use. Slab 2 holds items between 96-288 bytes. Slab 3 holds items between 288-1024 bytes. With our arguments, we could not store items larger than 1024 bytes.
So that's an overview of how Memcached is layed out. Pages, chunks, slabs. Now imagine we store a bunch of small items and a bunch of large items but no medium items until all pages from the page pool are used.
Slab 1 might have 800 pages and slab 3 might have 200 pages (those numbers are not exact, won't add up). Everything is good, until we try to store a medium size item.
Memcached will give us an out of memory error (or won't store the item w/o the -M option) because slab class 2 has no available pages to use. Pages from slab class 1 and slab class 3 can't be re-assigned to slab 2. You can think of Memcached as having separate internal caches defined by the slab classes.
Finally, this brings us back to the LRU question. Since each slab class is essentially a separate cache, there is no global LRU, only slab class LRU. The least recently used item won't get evicted for a new item if the items are not in the same slab class. Depending on your storage/access patterns, you could end up with one slab class constantly evicting recently used items, while another slab having a bunch of old items that just sit around.
Which brings me to another point, items/chunks are not actively reclaimed/expired. Memached does not have a background thread that explicitly expires items, reclaiming used chunks for new items. When a slab needs a chunk and there are no more pages, Memcache will look at the queue tail for items to evict. Memcache will make a best effort to evict expired items (items you've explicitly set to expire after some time). In scenario 1, item 2, an expired item, is evicted. However, in scenario 2, item 1, which has not yet expired, will be evicted, even though item 4 would seem like the better candidate. But since item 4 is not near the tail, Memcached stops looking and just expires item 1.
Another example of leaky abstraction? The Memcached API is very simple, straightforward. But at a certain point, you need implementation knowledge to make sure things are behaving as expected.
If I recall correctly, it also does not perform an LRU reordering of an entry if it has been reordered recently. This is a fair decision given the size of the cache as it maintains relative decency, but it is not a strict LRU. An efficient approach that avoids a background sweeper thread is lock amortization, which is used by ConcurrentLinkedHashMap to provide a non-blocking LRU data structure.
Posted by: Benjamin Manes | April 07, 2011 at 02:06 AM