bitfaster / BitFaster.Caching

High performance, thread-safe in-memory caching primitives for .NET

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clearing ConcurrentLru leaves cache in broken state

bitfaster opened this issue · comments

Originally posted by @khellang in #399 (comment)

I think we're bumping up against the same bug. Can you confirm this also affects ConcurrentLru<K, V>?

Is there a workaround? Looping through entries and calling TryRemove?

We're using v2.2.1.

UPDATE: Looping with TryRemove seems to leave the cache in a broken state as well 😅

Originally posted by @khellang in #399 (comment)

I have the following code

using BitFaster.Caching.Lru;
using System;
using System.Threading;

public  class Program
{
    private static int _counter;

    public static void Main(string[] args)
    {
        var cache = new ConcurrentLru<Guid, CacheItem>(10);

        for (int i = 0; i < 20; i++)
        {
            cache.GetOrAdd(Guid.NewGuid(), key => new CacheItem(Interlocked.Increment(ref _counter)));
        }

        Console.WriteLine();
        Console.WriteLine("Clear");
        Console.WriteLine();

        cache.Clear();


        for (int i = 0; i < 10; i++)
        {
            cache.GetOrAdd(Guid.NewGuid(), key => new CacheItem(Interlocked.Increment(ref _counter)));
        }
    }

    public class CacheItem : IDisposable
    {
        public CacheItem(int instanceId)
        {
            InstanceId = instanceId;
            Console.WriteLine($"{InstanceId} created.");
        }

        public int InstanceId { get; }

        public void Dispose()
        {
            Console.WriteLine($"{InstanceId} disposed.");
        }
    }
}

And we're seeing some pretty jarring results:

1 created.
2 created.
3 created.
4 created.
5 created.
6 created.
7 created.
8 created.
9 created.
10 created.
11 created.
9 disposed.
12 created.
10 disposed.
13 created.
11 disposed.
14 created.
12 disposed.
15 created.
13 disposed.
16 created.
14 disposed.
17 created.
15 disposed.
18 created.
16 disposed.
19 created.
17 disposed.
20 created.
18 disposed.

Clear

19 disposed.
20 disposed.
1 disposed.
2 disposed.
3 disposed.
4 disposed.
5 disposed.
6 disposed.
7 disposed.
8 disposed.
21 created.
22 created.
23 created.
21 disposed.
24 created.
22 disposed.
25 created.
23 disposed.
26 created.
24 disposed.
27 created.
25 disposed.
28 created.
26 disposed.
29 created.
27 disposed.
30 created.
28 disposed.

The eviction policy doesn't look like LRU and after calling Clear, it appears as if the capacity has shrunk to 2.

Are these results expected?

Originally posted by @khellang in #399 (comment)

I tried switching to ClassicLru<K, V> and these are the results I'd expect:

using BitFaster.Caching.Lru;
using System;
using System.Threading;

public  class Program
{
    private static int _counter;

    public static void Main(string[] args)
    {
        var cache = new ClassicLru<Guid, CacheItem>(10);

        for (int i = 0; i < 20; i++)
        {
            cache.GetOrAdd(Guid.NewGuid(), key => new CacheItem(Interlocked.Increment(ref _counter)));
        }

        Console.WriteLine();
        Console.WriteLine("Clear");
        Console.WriteLine();

        cache.Clear();

        for (int i = 0; i < 10; i++)
        {
            cache.GetOrAdd(Guid.NewGuid(), key => new CacheItem(Interlocked.Increment(ref _counter)));
        }
    }

    public class CacheItem : IDisposable
    {
        public CacheItem(int instanceId)
        {
            InstanceId = instanceId;
            Console.WriteLine($"{InstanceId} created.");
        }

        public int InstanceId { get; }

        public void Dispose()
        {
            Console.WriteLine($"{InstanceId} disposed.");
        }
    }
}
1 created.
2 created.
3 created.
4 created.
5 created.
6 created.
7 created.
8 created.
9 created.
10 created.
11 created.
1 disposed.
12 created.
2 disposed.
13 created.
3 disposed.
14 created.
4 disposed.
15 created.
5 disposed.
16 created.
6 disposed.
17 created.
7 disposed.
18 created.
8 disposed.
19 created.
9 disposed.
20 created.
10 disposed.

Clear

20 disposed.
18 disposed.
16 disposed.
14 disposed.
13 disposed.
11 disposed.
17 disposed.
19 disposed.
12 disposed.
15 disposed.
21 created.
22 created.
23 created.
24 created.
25 created.
26 created.
27 created.
28 created.
29 created.
30 created.

Originally posted by @khellang in #399 (comment)

Tried changing to EqualCapacityPartition:

var cache = new ConcurrentLru<int, CacheItem>(10, new EqualCapacityPartition(10), EqualityComparer<int>.Default);

That gave the following result:

1 created.
2 created.
3 created.
4 created.
5 created.
6 created.
7 created.
8 created.
9 created.
10 created.
11 created.
5 disposed.
12 created.
6 disposed.
13 created.
7 disposed.
14 created.
8 disposed.
15 created.
9 disposed.
16 created.
10 disposed.
17 created.
11 disposed.
18 created.
12 disposed.
19 created.
13 disposed.
20 created.
14 disposed.

Clear

15 disposed.
16 disposed.
17 disposed.
18 disposed.
1 disposed.
19 disposed.
2 disposed.
20 disposed.
3 disposed.
4 disposed.
21 created.
22 created.
23 created.
24 created.
25 created.
26 created.
27 created.
21 disposed.
28 created.
22 disposed.
29 created.
23 disposed.
30 created.
24 disposed.

Looks like it's related to the partitioning somehow 🤷‍♂️

Thanks for reporting this, I'll take a look at ConcurrentLru as well. There isn't any shared code, so this is a different bug. From your repro, it looks like the internal queue counters got stuck at full, I will debug it and see what's wrong.

Regarding eviction order, ConcrrentLru is an approximate LRU which trades off exact LRU order to optimize hit rate and concurrent throughput. ClassicLru provides exact LRU order, but has lower hit rate and throughput. ConcrrentLru will look more like LRU order if you replay something like a looped Zipfian distribution where some keys repeat more than others.

I investigated and found calling Clear() doesn't reset ConcurrentLru to the cold state. Consequently, items cannot enter the warm queue until they are requested a second time (this is how LRU is loosely approximated with 3 FIFO queues as described here). Since your test only fetches each key once, no items can enter warm and you observe reduced capacity.

When you change the partitioning strategy, it changes the size of the warm queue relative to cold and hot - this is why you observe a different size after calling clear with different partitions.

With a workload that has repeated access to some keys all capacity is available, but it may take longer/be harder to fully populate warm since fewer slots are available. I will fix it to reset the cold flag on clear.

Thanks for the quick investigation @bitfaster 🙏 Yeah, we certainly expected the cache to be fully reset to its initial state when calling Clear.

Just for clarification, this will not fix the pseudo-LRU eviction order, right? It seems to be evicting the next-to-hottest item very consistently, while there are 7-8 older items in the cache. Is that to be expected?

This is expected, given your test scenario, the 2Q algorithm and the warmup strategy.

The 2Q algorithm uses FIFO queues to estimate approximate LRU order. At it's core, 2Q partitions items into three sets, the hot, warm and cold FIFO queues. Thus, it cannot provide exact LRU ordering. To enter warm, an item must have been fetched at least once since it was first added to the cache. Otherwise, it will flow directly from hot to cold. If it is not accessed in cold, it will be evicted. Items flow through queues in FIFO order.

During warmup (the transition from empty to full), items can immediately enter the warm queue (the second access rule is relaxed). This allows all cache slots to fill up so you can use the full capacity. Allowing items directly into warm at the beginning gives a better chance of cache hits simply because there are more items in the cache (and this does indeed increase hit rate in all test scenarios).

In your original scenario, the cache is size 10. With the default FavorWarmPartition, this gives the following FIFO queue sizes hot =1, warm =8, cold=1. Once the cache is warm (warm queue has been filled), your scenario will look like this in the debugger when observing the internal [H]ot, [W]arm and [C]old queues, assuming you fetch keys a to j in alphabetical order:

H[j], W[i, h, g, f, e, d, c, b], C[a] - initial state, cache warm
H[K], W[i, h, g, f, e, d, c, b], C[j] - fetch K, push j to cold, evict a
H[L], W[i, h, g, f, e, d, c, b], C[K] - fetch L, push K to cold, evict j
H[M], W[i, h, g, f, e, d, c, b], C[L] - fetch M, push L to cold, evict K

Because no item is ever retrieved twice, all items flow directly from hot to cold in FIFO order, and the coldest item is evicted. No items can enter warm, because the entry criteria is not met: no item is fetched a second time. This is how 2Q is intended to work. You observe only 2 items cycling through because the total size of hot + cold is 2.

There are no optimal values to keep in the cache for a sequence of unique keys like this. I agree the result in your scenario is not intuitive and does not resemble LRU order. But keep in mind discarding items in FIFO order (or any order in fact) is equally effective as LRU order in this case, since there is no optimal solution. If the access pattern is more like a Zipfian distribution (or any other access pattern analyzed here), 2Q outperforms pure LRU across the board, and in some cases very significantly.

Hit rate is better because ConcurrentLru is not a pure LRU.