jaeyeom / sstable

SSTable implementation compatible with https://github.com/mariusaeriksen/sstable

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Store size as difference between next record?

xeoncross opened this issue · comments

Instead of storing the offset and length of each index entry, couldn't you just read the next entry (or EOF) and diff the two offsets to save those extra 4 bytes per record?

Yes. You're right. Moreover, we can even store just 32bit length instead of storing 64bit offset, and we can accumulate lengths to figure out the offset. That way you can save extra 8 bytes instead of saving 4 bytes per record.

I didn't optimize the size like that and just followed structure in https://github.com/mariusae/sstable implementation to be compatible. It's an arbitrary decision stated in README file of this repo. At least we can share the SSTable data between software written in 2 different langauges (Go and Haskell). While I was writing the code, I also felt that disk size is a bit wasted, but didn't care much because I was thinking of using this for big enough sized entries. If we want to optimize the size, it's probasbly a good idea to compress the data, which I think Google's SSTable implementation also had as an option. (On the second thought, that might be done by the value marshaler without touching SSTable implementation.)

If we want to optimize the size, we probably want to save it in the header because it changes the format.

Sorry for not responding earlier. It was buried in my email inbox for a long time.