rosedblabs / rosedb

Lightweight, fast and reliable key/value storage engine based on Bitcask.

Home Page:https://rosedblabs.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

enhancement: high memory usage of rosedb

roseduan opened this issue · comments

When I tested the kv stores, I found that rosedb will take very high memory usage, I think we can investgate why this will happen.

step to reproduce the problem:

git clone https://github.com/rosedblabs/kvstore-bench.git
cd kvstore-bench/cmd/kv-bench
go build

then test some kv stores:

  • rosedb
➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1000000  -p /tmp/rosedb
engine: rosedb
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 13.835s	72278 ops/s
get: 5.090s	196472 ops/s

put + get: 18.925s
file size: 391.09MB
peak sys mem: 2.06GB
  • goleveldb
➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e goleveldb -n 1000000  -p /tmp/goleveldb
engine: goleveldb
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 44.633s	22404 ops/s
get: 4.988s	200496 ops/s

put + get: 49.621s
file size: 357.25MB
peak sys mem: 286.33MB
  • bbolt
➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e bbolt -n 1000000  -p /tmp/bbolt
engine: bbolt
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 31.922s	31326 ops/s
get: 0.959s	1042852 ops/s

put + get: 32.881s
file size: 528.50MB
peak sys mem: 244.89MB
  • pebble
➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e pebble -n 1000000  -p /tmp/pebble
engine: pebble
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 21.072s	47456 ops/s
2023/07/28 22:58:08 [JOB 1] WAL file /tmp/pebble/000391.log with log number 000391 stopped reading at offset: 1537265; replayed 4015 keys in 4015 batches
get: 2.752s	363376 ops/s

put + get: 23.824s
file size: 107.98MB
peak sys mem: 235.64MB

In conclusion, we can see that rosedb has good read/write performance, but sys mem is also the highest, the file size is 391MB, but the sys mem is 2GB, which seems unreasonable.

commented

There seems to be a problem with the test tool.

  • First, I tried 1 key, mem: 19.11MB.
➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1  -p /tmp/rosedb
engine: rosedb
keys: 1
key size: 16-64
value size 128-512
concurrency: 5

put: 0.000s     4167 ops/s
get: 0.000s     18306 ops/s

put + get: 0.000s
file size: 0.00B
peak sys mem: 19.11MB
  • Second, I tried 100k keys, mem: 195.38MB.
➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 100000  -p /tmp/rosedb
engine: rosedb
keys: 100000
key size: 16-64
value size 128-512
concurrency: 5

put: 2.152s     46463 ops/s
get: 0.100s     996589 ops/s

put + get: 2.253s
file size: 39.12MB
peak sys mem: 195.38MB
  • Third, I tried 1 key, mem: 179.63MB, which seems unreasonable.
➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1  -p /tmp/rosedb
engine: rosedb
keys: 1
key size: 16-64
value size 128-512
concurrency: 5

put: 0.000s     2820 ops/s
get: 0.000s     8541 ops/s

put + get: 0.000s
file size: 39.12MB
peak sys mem: 179.63MB
commented

On the other hand, I guess the memory usage is high because the Batch is created every time, and I see that the previous issue is solving this problem.

On the other hand, I guess the memory usage is high because the Batch is created every time, and I see that the previous issue is solving this problem.

Thanks, but I wonder if there is something else causing this problem.

BTW, you can add an option -profile mem when runing the bench, and we can use pprof tool to analyze it.

In conclusion, I have check and optimize all these memory cost code except Index part.

Because the IRadix will take too much memory after my tests, so we can use more memory effecient data structures like BTree or Adaptive Radix Tree, we can do it if there are some users give us feedback.

And I am also exploring some on-disk indexes, like hashtable, lotusdb will also have more choices of index if we achieve this.