yozaam / TrieIndexLogDB

Write Heavy storage engine written using 'Trie' as in memory index

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TrieIndexLogDB is a key/value datastore using log-structured files with trie structured indexes, allowing very fast write access and lowering RAM requirements compared to other log-structured datastores which use Hash Table Indices

How to run ?

  1. Clone repo
    git clone https://github.com/yozaam/TrieIndexLogDB.git
  2. Go inside directory
    cd TrieIndexLogDB
  3. Compile the program
    g++ main.cpp -o main.o
  4. Run the program
    ./main.o

Design of the engine

"Example of Hash Index: Bitcask is an Erlang application that provides an API for storing and retrieving key/value data using log-structured hash tables that provide very fast access."

  • General Architecture for Log Structured DB

    • Every create/update appends new key value pair into the file
    • maintain the pointer to the latest occurance of a key in the hash table / index
  • Bitcask architecture / Hash Index Hash Index Log Structured DB

  • TrieIndexLogDB architecture / Trie Index Trie Index Log Structured DB

  • Bitcask stores every key in memory, along with the byte offset for that key in disk

  • This limits the no. of keys that may be stored because they all have to fit in RAM

  • My lightwieght storage engine will reduce the space requirements by using a 'Trie' index

(A string Hash (Bitcask) & Trie traversal both have the same asymptotic time complexity but Trie saves a lot of space by avoiding duplication!)

Future Scope

  • Divide into separate classfiles
  • Implement RestAPI class as well as CLI
  • Implement segmentation, by splitting the files (will need to convert offset into (file_segment_no, offset) pair)
  • Range Updates (by simultaneous Trie Traversal and sorted Trie Children)

About

Write Heavy storage engine written using 'Trie' as in memory index


Languages

Language:C++ 87.0%Language:Python 13.0%