amallia / SOSD

A Benchmark for Learned Indexes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

   _____ ____  _____ ____ 
  / ___// __ \/ ___// __ \
  \__ \/ / / /\__ \/ / / /
 ___/ / /_/ /___/ / /_/ / 
/____/\____//____/_____/  
                          

Search on Sorted Data Benchmark

SOSD is a benchmark to compare (learned) index structures on equality lookup performance. It comes with state-of-the-art baseline implementations to compare against and many datasets to compare on. Each dataset consists of 200 million 32-bit or 64-bit unsigned integers.

Usage instructions

We provide a number of scripts to automate things. Each is located in the scripts directory, but should be executed from the repository root.

AWS setup

If you'd like to run the whole benchmark on AWS, provision a new c5d.4xlarge node using the Ubuntu 18 AMI, and then copy and paste this after logging in.

NOTE: The scripts/setup_aws.sh sets up an AWS c5d.4xlarge or c5d.2xlarge node for the rest of the benchmark scripts (it is destructive!). It can be run from your home directory, and will automatically clone this repository.

wget https://raw.githubusercontent.com/learnedsystems/SOSD/master/scripts/setup_aws.sh
chmod +x ./setup_aws.sh
. ./setup_aws.sh
cd /sosdata/SOSD

Local setup

If you'd like to run the benchmark locally, you need a machine with at least 16GiB of RAM and 50GiB of free disk space. We've included a setup_anywhere.sh script which installs the prerequisites on apt based systems (Debian, Ubuntu). To set it up, follow these steps:

git clone git@github.com:learnedsystems/SOSD.git
cd SOSD
. ./scripts/setup_anywhere.sh

Running the benchmark

  • scripts/download.sh downloads and stores required data from the Internet
  • scripts/build_rmis.sh compiles and builds the RMIs for each dataset
  • scripts/prepare.sh constructs query workloads and compiles the benchmark
  • scripts/execute.sh executes the benchmark on each workload, storing the results in results
  • scripts/create_leaderboard.sh gathers all results in results into an easy to read table (see below)

To run them all, execute reproduce.sh.

Results

Here is the current "ranking" of index structures (numbers are in nanoseconds per lookup). If you have any improvements to existing indexes or want to add new ones, feel free to contact us. Provided that they fulfill the constraints outlined in the paper, we will update the "leaderboard" accordingly. All measurements are performed on the target platform: c5d.4xlarge. For obvious reasons, we retain the freedom to exclude any submissions at our discretion.

ART B-tree BS FAST IS RBS RMI RS TIP
amzn32 n/a 529 773 244 4604 325 264 275 731
face32 187 524 771 229 1285 312 274 386 964
logn32 n/a 522 765 294 n/a 471 97.0 105 744
norm32 191 522 771 229 10257 355 71.7 70.9 884
uden32 102 521 771 228 39.8 333 54.2 64.2 176
uspr32 n/a 524 771 230 469 301 153 200 400
amzn64 n/a 601 804 n/a 4736 387 266 288 759
face64 391 592 784 n/a 1893 337 334 461 1232
logn64 309 597 784 n/a n/a 753 179 120 454
norm64 266 592 785 n/a 10510 405 71.5 70.5 862
osmc64 n/a 599 785 n/a 95076 492 402 437 7186
uden64 112 592 784 n/a 43.4 344 54.3 53.9 193
uspr64 287 591 785 n/a 449 313 169 214 428
wiki64 n/a 608 802 n/a 7846 364 222 218 1019

Cite

If you use this benchmark in your own work, please cite our paper:

@article{sosd,
  title={SOSD: A Benchmark for Learned Indexes},
  author={Kipf, Andreas and Marcus, Ryan and van Renen, Alexander and Stoian, Mihail and Kemper, Alfons and Kraska, Tim and Neumann, Thomas},
  journal={NeurIPS Workshop on Machine Learning for Systems},
  year={2019}
}

About

A Benchmark for Learned Indexes

License:MIT License


Languages

Language:HTML 80.9%Language:C++ 12.5%Language:Makefile 2.4%Language:Shell 1.3%Language:Gnuplot 1.2%Language:JavaScript 0.8%Language:CSS 0.4%Language:C 0.3%Language:Python 0.1%Language:M4 0.1%Language:Perl 0.1%Language:CMake 0.0%Language:Awk 0.0%