enhancement: high memory usage of rosedb
roseduan opened this issue · comments
When I tested the kv stores, I found that rosedb will take very high memory usage, I think we can investgate why this will happen.
step to reproduce the problem:
git clone https://github.com/rosedblabs/kvstore-bench.git
cd kvstore-bench/cmd/kv-bench
go build
then test some kv stores:
- rosedb
➜ kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1000000 -p /tmp/rosedb
engine: rosedb
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5
put: 13.835s 72278 ops/s
get: 5.090s 196472 ops/s
put + get: 18.925s
file size: 391.09MB
peak sys mem: 2.06GB
- goleveldb
➜ kv-bench git:(main) ✗ ./kv-bench -c 5 -e goleveldb -n 1000000 -p /tmp/goleveldb
engine: goleveldb
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5
put: 44.633s 22404 ops/s
get: 4.988s 200496 ops/s
put + get: 49.621s
file size: 357.25MB
peak sys mem: 286.33MB
- bbolt
➜ kv-bench git:(main) ✗ ./kv-bench -c 5 -e bbolt -n 1000000 -p /tmp/bbolt
engine: bbolt
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5
put: 31.922s 31326 ops/s
get: 0.959s 1042852 ops/s
put + get: 32.881s
file size: 528.50MB
peak sys mem: 244.89MB
- pebble
➜ kv-bench git:(main) ✗ ./kv-bench -c 5 -e pebble -n 1000000 -p /tmp/pebble
engine: pebble
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5
put: 21.072s 47456 ops/s
2023/07/28 22:58:08 [JOB 1] WAL file /tmp/pebble/000391.log with log number 000391 stopped reading at offset: 1537265; replayed 4015 keys in 4015 batches
get: 2.752s 363376 ops/s
put + get: 23.824s
file size: 107.98MB
peak sys mem: 235.64MB
In conclusion, we can see that rosedb has good read/write performance, but sys mem
is also the highest, the file size is 391MB, but the sys mem
is 2GB, which seems unreasonable.
There seems to be a problem with the test tool.
- First, I tried 1 key, mem: 19.11MB.
➜ kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1 -p /tmp/rosedb
engine: rosedb
keys: 1
key size: 16-64
value size 128-512
concurrency: 5
put: 0.000s 4167 ops/s
get: 0.000s 18306 ops/s
put + get: 0.000s
file size: 0.00B
peak sys mem: 19.11MB
- Second, I tried 100k keys, mem: 195.38MB.
➜ kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 100000 -p /tmp/rosedb
engine: rosedb
keys: 100000
key size: 16-64
value size 128-512
concurrency: 5
put: 2.152s 46463 ops/s
get: 0.100s 996589 ops/s
put + get: 2.253s
file size: 39.12MB
peak sys mem: 195.38MB
- Third, I tried 1 key, mem: 179.63MB, which seems unreasonable.
➜ kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1 -p /tmp/rosedb
engine: rosedb
keys: 1
key size: 16-64
value size 128-512
concurrency: 5
put: 0.000s 2820 ops/s
get: 0.000s 8541 ops/s
put + get: 0.000s
file size: 39.12MB
peak sys mem: 179.63MB
On the other hand, I guess the memory usage is high because the Batch is created every time, and I see that the previous issue is solving this problem.
On the other hand, I guess the memory usage is high because the Batch is created every time, and I see that the previous issue is solving this problem.
Thanks, but I wonder if there is something else causing this problem.
BTW, you can add an option -profile mem
when runing the bench, and we can use pprof tool to analyze it.
In conclusion, I have check and optimize all these memory cost code except Index part.
Because the IRadix will take too much memory after my tests, so we can use more memory effecient data structures like BTree or Adaptive Radix Tree, we can do it if there are some users give us feedback.
And I am also exploring some on-disk indexes, like hashtable, lotusdb will also have more choices of index if we achieve this.