rosedblabs / rosedb

Lightweight, fast and reliable key/value storage engine based on Bitcask.

Home Page:https://rosedblabs.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NEW: Periodically remove expired keys from memory

Jeremy-Run opened this issue · comments

commented

There may be expired keys in the index, after adding PutWithTTL(). In addition to deleting these keys when Get() / Exist() / Ascend() ..., a goroutine can also be set up to delete them periodically.

commented

@roseduan Do you think this feature is necessary?

Yes it is necessary, but we need to rethink the specific strategy, may be we can see how Redis does.

commented

@roseduan Is this plan ok?

Points to consider:

  • Deleting the expired key requires a global write lock. If we read the record from the disk and then judge whether record is expired, it will affect the performance, so this operation should be based on pure memory.
  • For Bitcask, memory is very precious, so users should decide whether we can occupy more cache to perform expired key recovery based on actual scenarios.

Implementation plan:

  • Add a global switch to enable automatic deletion of expired keys.
  • If it is enabled, build an orderedMap in the cache, store the key and expiration time, sort according to the expiration time from small to large; then start a background goroutine, wait for the minimum time in the ordered dictionary to arrive before performing deletion (a random number will be made here Wait to prevent batch keys from expiring at the same time and occupying the write lock all the time). Merge(true) / Open() needs to rebuild this orderedMap in memory.
  • If it is not enabled, users can choose to call Merge(true) actively, because when Merge(true), the index will be rebuilt and expired keys will be filtered out.

Why do we need an ordered map to store the key and expiration?

I think the easiest and most straightforward way to do this is to iterate all keys in BTree and delete all expired keys.

We can add a public function to callers(like DeleteExpiredKeys), they can call it as their needs.

And we should clarify that this method may take a long time to finish, we can do some other optimizations when someone acquires for it.

commented

My plan is to delete expired keys without stopping DB service as much as possible (because this is an optimization function, not a core function); If it is allowed to stop the service for a long time, it is a good idea to iterate all the keys of the BTree to delete the expired keys. Of course, executing Merge(true) or reopening the db can also filter the expired keys.

The reason for using an orderedMap for storage is mainly due to two considerations:
①The key in the map is unique, which is unified with our db.
②It is in order, so it's possible to know the next execution time of the background goroutine and avoid the time of invalid write lock.

Of course it takes more memory, and written data that have expiration time will be slower. As you said, we can make a simple mechanism (scan BTree) first, and then optimize it according to the needs if there are user needs later.

OK, we can just make a simple mechanism, to scan the btree to delete expired keys.
And we can add a parameter to limit the execution time of the function, to avoid locking too long time.

Thanks for your recent work.