enhancement: high memory usage of rosedb

Question

enhancement: high memory usage of rosedb

roseduan opened this issue a year ago · comments

When I tested the kv stores, I found that rosedb will take very high memory usage, I think we can investgate why this will happen.

step to reproduce the problem:

git clone https://github.com/rosedblabs/kvstore-bench.git
cd kvstore-bench/cmd/kv-bench
go build

then test some kv stores:

rosedb

➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1000000  -p /tmp/rosedb
engine: rosedb
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 13.835s	72278 ops/s
get: 5.090s	196472 ops/s

put + get: 18.925s
file size: 391.09MB
peak sys mem: 2.06GB

goleveldb

➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e goleveldb -n 1000000  -p /tmp/goleveldb
engine: goleveldb
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 44.633s	22404 ops/s
get: 4.988s	200496 ops/s

put + get: 49.621s
file size: 357.25MB
peak sys mem: 286.33MB

bbolt

➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e bbolt -n 1000000  -p /tmp/bbolt
engine: bbolt
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 31.922s	31326 ops/s
get: 0.959s	1042852 ops/s

put + get: 32.881s
file size: 528.50MB
peak sys mem: 244.89MB

pebble

➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e pebble -n 1000000  -p /tmp/pebble
engine: pebble
keys: 1000000
key size: 16-64
value size 128-512
concurrency: 5

put: 21.072s	47456 ops/s
2023/07/28 22:58:08 [JOB 1] WAL file /tmp/pebble/000391.log with log number 000391 stopped reading at offset: 1537265; replayed 4015 keys in 4015 batches
get: 2.752s	363376 ops/s

put + get: 23.824s
file size: 107.98MB
peak sys mem: 235.64MB

In conclusion, we can see that rosedb has good read/write performance, but sys mem is also the highest, the file size is 391MB, but the sys mem is 2GB, which seems unreasonable.

Jeremy · Answer 1 · Sat Jul 29 2023 11:59:42 GMT+0800 (China Standard Time)

There seems to be a problem with the test tool.

First, I tried 1 key, mem: 19.11MB.

➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1  -p /tmp/rosedb
engine: rosedb
keys: 1
key size: 16-64
value size 128-512
concurrency: 5

put: 0.000s     4167 ops/s
get: 0.000s     18306 ops/s

put + get: 0.000s
file size: 0.00B
peak sys mem: 19.11MB

Second, I tried 100k keys, mem: 195.38MB.

➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 100000  -p /tmp/rosedb
engine: rosedb
keys: 100000
key size: 16-64
value size 128-512
concurrency: 5

put: 2.152s     46463 ops/s
get: 0.100s     996589 ops/s

put + get: 2.253s
file size: 39.12MB
peak sys mem: 195.38MB

Third, I tried 1 key, mem: 179.63MB, which seems unreasonable.

➜  kv-bench git:(main) ✗ ./kv-bench -c 5 -e rosedb -n 1  -p /tmp/rosedb
engine: rosedb
keys: 1
key size: 16-64
value size 128-512
concurrency: 5

put: 0.000s     2820 ops/s
get: 0.000s     8541 ops/s

put + get: 0.000s
file size: 39.12MB
peak sys mem: 179.63MB

Jeremy · Answer 2 · Sat Jul 29 2023 12:13:59 GMT+0800 (China Standard Time)

On the other hand, I guess the memory usage is high because the Batch is created every time, and I see that the previous issue is solving this problem.

roseduan · Answer 3 · Sat Jul 29 2023 15:16:18 GMT+0800 (China Standard Time)

On the other hand, I guess the memory usage is high because the Batch is created every time, and I see that the previous issue is solving this problem.

Thanks, but I wonder if there is something else causing this problem.

BTW, you can add an option -profile mem when runing the bench, and we can use pprof tool to analyze it.

roseduan · Answer 4 · Sat Aug 05 2023 15:50:30 GMT+0800 (China Standard Time)

In conclusion, I have check and optimize all these memory cost code except Index part.

Because the IRadix will take too much memory after my tests, so we can use more memory effecient data structures like BTree or Adaptive Radix Tree, we can do it if there are some users give us feedback.

And I am also exploring some on-disk indexes, like hashtable, lotusdb will also have more choices of index if we achieve this.