In-Memory Index Benchmark (IMIB)

IMIB is a benchmark framework for evaluating single-attribute in-memory secondary indexes in their lookup speed, maintenance cost, and memory consumption. It contains various benchmark cases explicitly implemented to evaluate the latency of equality and range lookups, inserts, deletes, and bulk operations (bulk inserts and bulk loads) of included index implementations. Furthermore, the benchmark cases for evaluating inserts, bulk inserts, and bulk loads also measure the corresponding index implementation's allocated memory.

Included Implementations

Implementation	Data structure	C++ class
Unsync ART	radix tree	`ART_unsynchronized::Tree`
MP Judy	radix tree	`judyLArray`
TLX B+ Tree	B+ tree	`tlx::btree_map`
Abseil B-Tree	B-tree	`absl::btree_map`
BB-Tree	k-ary search tree	`BBTree`
PG Skip List	skip list	`goodliffe::multi_skip_list`
RH Flat Map	hash map	`robin_hood::unordered_flat_map`
RH Node Map	hash map	`robin_hood::unordered_node_map`
TSL Robin Map	hash map	`tsl::robin_map`
TSL Sparse Map	hash map	`tsl::sparse_map`
STD Hash Map	hash map	`std::unordered_map`

Requirements

clang compiler (tested with version 9)
cmake (minimum 3.9)
ninja
jemalloc
autoconf

The Unsync ART additionally requires the tbb library.

Build Instructions

git submodule update --init --recursive
mkdir build
cd build/
cmake .. -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -GNinja
ninja

Execution

Run the test suite:

./imiTest

Run the benchmark:

./imiBench <key type> <iterations> <data binary file> <equality lookup file> <range lookup file> <result file prefix>

Generate unsigned integer datasets:

./scripts/generate_uint_data.py <data size> <subset count> <equality lookup count> <range lookup selectivities> <range lookup count>

Benchmark sparse/dense unsigned integer entry keys

Generate the datasets from the project root directory:

./scripts/generate_uint_data.py 10000000 5 1000000 0.0001 1000000

The above command generates five index entry datasets, with the largest having 10 million entries. The resulting datasets have 2, 4, 6, 8, and 10 million entries. Besides, a dataset specifying equality lookup keys is generated. This contains 1 million search keys. Also, a dataset specifying ranges for range lookups is generated. This contains 1 million search key ranges. Each resulting range lookup has a selectivity of 0.0001 (0.01%).

Run the benchmark script to execute the imiBench binary with the various sized datasets:

cd build/
../scripts/bench_sparse_dense.py

The benchmark script runs the imiBench binary for the datasets that are present in resources/data. For example, if only index entries with dense 64-bit unsigned integer keys are to be used, only files with the prefix dense_uint64 should be present in this directory.

Visualize the Results: If multiple json result files are generated for all four combinations of sparse/dense ascending/shuffled datasets, graphs showing for all datasets and the four various data characteristic combinations can be visualized with the following script. This has to be executed in the directory with the json files (assuming the build directory in this example):

../scripts/plot_multi_json_results.py subplots

Cite Our Work

If you use this benchmark framework in your work, please cite us.

@inproceedings{Weisgut21ExperimentalIndexEvaluation,
  author    = {Marcel Weisgut},
  title     = {Experimental Index Evaluation for Partial Indexes in Horizontally
               Partitioned In-Memory Databases},
  booktitle = {Proceedings of the 32nd GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken)},
  series    = {{CEUR} Workshop Proceedings},
  year      = {2021},
}

mweisgut / IMIB