dgraph-io / ristretto

A high performance memory-bound Go cache

Home Page:https://dgraph.io/blog/post/introducing-ristretto-high-perf-go-cache/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems with the expiration

linger1216 opened this issue · comments

Question 1:

We know that the expired data is stored in the map's map, The key generation strategy is related to time.

like

time:0s storageBucket:1 cleanupBucket:0
time:1s storageBucket:1 cleanupBucket:0
time:2s storageBucket:1 cleanupBucket:0
time:3s storageBucket:1 cleanupBucket:0
time:4s storageBucket:1 cleanupBucket:0
time:5s storageBucket:2 cleanupBucket:1
time:6s storageBucket:2 cleanupBucket:1
time:7s storageBucket:2 cleanupBucket:1
time:8s storageBucket:2 cleanupBucket:1
time:9s storageBucket:2 cleanupBucket:1
time:10s storageBucket:3 cleanupBucket:2
time:11s storageBucket:3 cleanupBucket:2
time:12s storageBucket:3 cleanupBucket:2
time:13s storageBucket:3 cleanupBucket:2
time:14s storageBucket:3 cleanupBucket:2
time:15s storageBucket:4 cleanupBucket:3

If my cleanup time set is not good, will some records be skipped?

eg: Every 15s, only found bucket3. bucket1, bucket2 is never be cleaned up again, and alway in the expiration map, is it right?

Question2:

cleanup code as below

func (m *expirationMap) cleanup(store store, policy policy, onEvict onEvictFunc) {

	now := time.Now()
	bucketNum := cleanupBucket(now)
	keys := m.buckets[bucketNum]
	delete(m.buckets, bucketNum)
	
	for key, conflict := range keys {
		// if key is expired del it
		// bla bla
	}
}

assume now want to clean bucket1, bucket1 has 100 keys, 60 expired, 40 not, according to the code, 60 keys will be deleted, and bucket1 deleted
the other 40keys, will in store, but never be clean up, it doesn't matter? why design like this?
At the same time, I also noticed that when I get, I only check whether it is expired, and I did not delete it at this time, Wouldn't it be better to delete?

Question3:
cleanupTicker is check every two seconds, Checked the bucket about 5 seconds ago, not all keys expire within 5s(eg. 1Day), So I guess that most of the keys will not be cleaned, but bucket is deleled. So I wonder if this design can be improved?

Thank you very much, please forgive my english

Question 1:

Yes, it is possible to skip some buckets if the values for how big a bucket is and how often the cleanup is run were set up incorrectly. But the cleanup ticker is set to run twice for every bucket (if the bucket size is 10sec, cleanup runs every 5s). This gives us enough confidence that cleanup will have a chance to clean every bucket. It's still possible that a bucket could be skipped but the get checks the actual expiration time so skipping a bucket won't lead to invalid values being returned by gets.

Question 2:

cleanup always tries to cleanup a bucket that has only expired items. The check inside the for loop is there just as a safeguard.
Gets don't delete expired entries to make them as fast as possible. In general, most of the design decisions in Ristretto are geared towards doing work outside of the critical path. In the case of cleanup, it means expired entries will remain in the map until the cleanup routine runs in the background.

Question 3:

This is not how the cleanup works. The bucket is based on the expiration time, not the insertion time. So if there's an entry that expires in a day, it won't be processed by cleanup until roughly a day from now. An item that expires in 5 sec and another one that expires in one day will not share a bucket.

Feel free to reopen if you still have questions.