dgraph-io / ristretto

A high performance memory-bound Go cache

Home Page:https://dgraph.io/blog/post/introducing-ristretto-high-perf-go-cache/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]: Expiration map grows without limit

nemorales opened this issue · comments

What version of Ristretto are you using?

v0.1.1

What version of Go are you using?

1.20

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, CPU, OS)?

16GB RAM, Apple M1 Pro , macOS Monterey 12.5

What steps will reproduce the bug?

We encountered problems with memory usage on production, so I coded a local application and stressed it. Configuration of ristretto:

ristretto.Config{NumCounters: 1000000,MaxCost: int64(100 * bytesInMb),BufferItems: 64,Metrics:true}

We used TTL of an hour (3600* time.Second) and stressed the application to generate 1k items per 10ms, an item uses 200 bytes aprox.

After reaching eviction point, we leave the application stresser running for 10 more min. Using pprof to analyze in use space we see this:

imagen

Expected behavior and actual result.

I would expect the keys added for expiration in the expiration map to be removed once the same key gets evicted.

What it seems to happen is that the key is not removed from the expiration map after eviction, so when using a big enough TTL the memory usage of ristretto knows no limit.

Additional information

No response

Hey @nemorales it seems we are encountering this as well. Did you run a garbage collection prior to running pprof? (I'm not 100% sure that it happens automatically).

Looking at the code there a chance for the expiration bucket cleanup to be skipped under load. There is a single goroutine responsible for a number of things, among which cleaning up the "just expired bucket". A timer ticker has a buffer of one, so if the cleanup is not happening fast enough (e.g. due to some type of lock contention, or a heavy OnEvict function), the ticker will skip. This can cause a bucket to not be cleaned up, and it will remain in place until the parent cache is GC'ed.

Unrelated (in terms of cause), but roughly in the same code: clearing the cache also doesn't currently clear the expiryMap, which seems to be an oversight. as it makes no sense to keep it around if the items themselves have been cleared.

Replication is fairly easy by just adding a time.Sleep of some duration (I used 1 millisecond) as the OnEvict function, which will cause the cleaning up of buckets to be too slow to keep up under any kind of load, triggering the behavior.

@nemorales @Woutifier - Have you continued to experience this behavior? I'm seeing something similar and was curious if either of you found a solution.

I'm thinking of these two changes: https://github.com/Woutifier/ristretto/commits/wouter/clear-expiry

Since then, I've found another issue, which is that if you don't set a TTL at all (Just Set, not SetWithTTL), and the item you're setting is already in the cache, it will use the "Update" path. Update doesn't properly filter out items without expiration, and just creates a new "zero time" bucket and adds the item into that.

Edit: added a fix for the new issue to the branch above too.

@Woutifier that's an interesting finding. Now I understand why ristretto has different memory usage behavior depending on the load: in my use case we decided not to use TTL but there are some cases in which memory usage grows without limit.

I'm taking a look at your commits, thanks for sharing what you learned!

Is there any new information on this issue?

I've opened a PR related to this issue: #358

I have a similar issue with TTL. Memory usage is okay when I set the same key in the cache with a lower TTL(like 1 hour). But if I increased the TTL to 1 day, memory usage exceeded 1 GB.