simerplaha / SwayDB

Persistent and in-memory key-value storage engine for JVM that scales on a single machine.

Home Page:https://swaydb.simer.au

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Controlled concurrency for compaction related read operations

simerplaha opened this issue · comments

commented

Overview

Compaction can be configured to run highly concurrent for all in-memory operations. It use all the allocated threads (configurable) to perform

  • Merge
  • Creating indexes - sorted-index, binary-search, hash-index, compression etc
  • Concurrent multiple compactions on each Level, Segment, each Segment-Block and every Block within every other Block. So all in-memory compaction tasks run with high concurrency without blocking.

What is not concurrent? Disk writes

Persisting files to disk is not concurrent. Segment files get buffered into an IO Actor which sequentially & asynchronously writes them to disk so there are NO concurrent writes to disk.

Task

The same buffered IO should also get used for all compaction related read operations. If the data can be fetched from the cache then it should maintain concurrency but for disk seeks it should be sequential similar to writes.

Scope

This only applies to persistent data-structures. All in-memory data-structures are concurrent for all operations (still controlled via concurrency configurations).

commented

Some blocking code to avoid concurrent disk IO during compaction is needed which means no more 100% non-blocking guarantees (for a small part of compaction - everything else is non-blocking).

Changing existing code to return Future for disk IO functions seems like overkill for the following cases

  • Persistent databases - When the required data is completely in cache and no disk IO is needed
  • Persistent databases - With memory-mapped files
  • In-memory databases.
  • We would loose stack-safety and performance in functions that use tail-recursion.
  • Future object allocations overhead. This would bring back old performance issues resolved by removing IO & Try from core.
  • Databases that use a synchronised bags (Try, Glass etc) would unnecessarily pay for the cost of Future.

This could be solved by using higher-kinded types in core which definitely increases memory allocations and is not the most performant approach as tested in Streams.

Solution

Creating Segments is already sequential. We just need to use synchronised read access when initialising Segment iterators during compaction.

TODO - How to allow concurrent disk IO when accessing data on multiple disks.