[Feature request] Atomic TryRemove

Question

[Feature request] Atomic TryRemove

hach-que opened this issue a year ago · comments

I have an LRU cache that is used like this:

When a directory is requested, it does GetOrAdd on an atomic LRU to enumerate the current directory entries and store them in the cache.
When a file or subdirectory is modified, it calls TryRemove on the LRU to evict the entry from the cache.

However, looking at the source code of AtomicFactoryCache, it doesn't look like TryRemove is guarded at all, which means I can run into this scenario:

Thread A requests a directory which isn't in the cache, so GetOrAdd starts running. It gets the directory enumerations at this point in time.
Thread B modifies a file or subdirectory and calls TryRemove to remove the item from the cache. GetOrAdd hasn't finished running yet, but the value produced by the value factory will be stale.
Thread A saves the stale value into the LRU cache.

I thought initially to set some sort of "stale" flag before and after TryRemove executes, and have the value factory check it before returning (if the stale flag is set at the end of the value factory, it would clear the stale flag and re-evaluate itself). However, I don't believe this will be perfect because it's still possible for this sequence of events to occur:

Thread A's value factory checks stale flag, it's not set.
Thread A schedules and calls TryRemove and sets the stale flag.
Thread A's value factory returns it's now stale value to the LRU which gets stored.

Admittedly the window here is much much smaller than the normal case, but I would still like to prevent it.

What I would like is a setting that makes TryRemove discard not only the current value in the cache, but also prevents any current in-flight GetOrAdd requests from storing their results in the cache (it's totally fine for them to return stale data for their individual requests, I just don't want it persisting in the cache).

June Rhodes · Answer 1 · Sat Mar 11 2023 00:37:02 GMT+0800 (China Standard Time)

This was my first attempt at a wrapper around this, but based on testing it doesn't appear to be correct:

private class BreakableEntry<K, V>
{
    private ReaderWriterLockSlim _rwLock = new ReaderWriterLockSlim();
    private V? _cached;
    private volatile bool _didBreakMidflight;
    private readonly K _key;
    private readonly Func<K, V> _factory;

    public BreakableEntry(K key, Func<K, V> factory)
    {
        _key = key;
        _factory = factory;
    }

    public void Break()
    {
        _didBreakMidflight = true;
    }

    public V Entries
    {
        get
        {
            _rwLock.EnterUpgradeableReadLock();
            try
            {
                if (_cached != null && !_didBreakMidflight)
                {
                    return _cached;
                }
                _rwLock.EnterWriteLock();
                try
                {
                    if (_cached != null && !_didBreakMidflight)
                    {
                        return _cached;
                    }
                    var entries = _factory(_key);
                    if (!_didBreakMidflight)
                    {
                        _cached = entries;
                    }
                    else
                    {
                        _didBreakMidflight = false;
                    }
                    return entries;
                }
                finally
                {
                    _rwLock.ExitWriteLock();
                }
            }
            finally
            {
                _rwLock.ExitUpgradeableReadLock();
            }
        }
    }
}

And then to break the cache instead of calling TryRemove, I would do:

if (_projectionCache.TryGet(key, out var cache))
{
  cache.Break();
}

Alex Peck · Answer 2 · Sat Mar 11 2023 04:09:14 GMT+0800 (China Standard Time)

Within GetOrAdd the atomic cache immediately stores a wrapper for each cache entry that is like a future - it represents the object that will be created by the value factory.

The process is roughly this:

Call GetOrAdd(X, factory)
Insert a wrapper for X, so the cache now knows that X exists, but the value for key X is not yet created
Invoke factory for X, and store the result in the wrapper of X

If TryRemove(X) is called before step 3, the wrapper is immediately removed from the cache. Step 3 will continue to run and the result of the factory delegate will be stored in the wrapper that is no longer referenced by the cache.

So, in your scenario below, thread A saves a stale value into a wrapper object that is dereferenced - the cached wrapper is deleted by the TryRemove call.

Thread A requests a directory which isn't in the cache, so GetOrAdd starts running. It gets the directory enumerations at this point in time.

Thread B modifies a file or subdirectory and calls TryRemove to remove the item from the cache. GetOrAdd hasn't finished running yet, but the value produced by the value factory will be stale.

Thread A saves the stale value into the LRU cache.

This is a quick test method I wrote that verifies your scenario does not store a value for X by forcing the interleaving of threads A and B:

    public class AtomicTest
    {
        private readonly ICache<string, string> cache = new ConcurrentLruBuilder<string, string>().WithAtomicGetOrAdd().Build();

        [Fact]
        public async Task Test()
        {
            var threadAAdded = new TaskCompletionSource();
            var threadASignal = new TaskCompletionSource();
            var threadBSignal = new TaskCompletionSource();

            var a = Task.Run(() => ThreadA(threadAAdded, threadASignal));
            var b = Task.Run(() => ThreadB(threadAAdded, threadBSignal));

            // wait for B to delete 
            await threadBSignal.Task;

            // signal thread A to store a stale item after B has deleted X
            threadASignal.SetResult();

            await a;

            // stale value is not available in the cache
            cache.TryGet("X", out var value).Should().BeFalse();
        }

        public void ThreadA(TaskCompletionSource addedSignal, TaskCompletionSource wait)
        {
            cache.GetOrAdd("X", _ => 
            {
                addedSignal.SetResult();

                // wait for thread B before completing the factory call
                wait.Task.Wait();
                return "Stale Data!"; 
            });
        }

        public void ThreadB(TaskCompletionSource signalAdded, TaskCompletionSource signalRemoved)
        {
            // wait until the value has been added to the cache
            signalAdded.Task.Wait();

            cache.TryRemove("X");

            // tell A to continue with factory delegate only after B has removed X
            signalRemoved.SetResult();
        }
    }

There is a further possible race condition if TryRemove can be called at almost the same time as GetOrAdd. In this case it is possible for TryRemove to execute before GetOrAdd has stored the wrapper. There is no attempt to synchronize or defend against this case. It is considered equivalent to calling TryRemove(X), then GetOrAdd(X).

June Rhodes · Answer 3 · Mon Mar 13 2023 14:32:25 GMT+0800 (China Standard Time)

I can confirm that TryRemove works fine with the atomic cache in practice.