cereal time-series cpp11 succint-data-structure cmake big-data stl wikimedia range top-k python3 matplotlib seaborn shell-script elias-fano

Time Series Indexing - LABD

This project has been made for educational purpose during lecture of Laboratory on Algorithms for Big Data ( University of Pisa ) 2016/17.

Libraries

Cereal - Serialization library
SDSL - Succinct Data Structures Library
G-Test - Google Unit Test

Tech

CMake - Family of tools designed to build, test and package software

Usage

Generate a makefile and build project:

$ mkdir _build
$ cd _build
$ cmake ..
$ make

The executables will be placed in _build/bin/.

Indices

There are two different implementations, each of them is represented by an index:

0 : Baseline implementation
1 : Implementation with succinct data structures

Data serialization

$ cd _build/bin/
$ ./build_index id path/to/dataset

The object is saved in the same folder where the input dataset is located, and named like datases+index.

Build query sets

$ ./build_query_sets path/to/dataset num_of_query min_date_interval max_k

This one builds (min_date_interval\100)*max_k different set of queries each having a different size of Range and K. If there is a serialized data structure with id=0, the creation will be faster because it won't need to populate any structure from file.

Run queries

$ ./run_queries id path/to/dataset path/to/query_set

Test

Test index=0 implementation:

$ ./test_baseline

Test index=1 implementation:

$ ./test_index1

About

Laboratory on Algorithms for Big Data a.a. 2016/17 - University of Pisa

https://goo.gl/OaC6HB

cereal time-series cpp11 succint-data-structure cmake big-data stl wikimedia range top-k python3 matplotlib seaborn shell-script elias-fano

MIT License

Languages

Language:C++ 84.7%Language:Python 8.5%Language:CMake 5.5%Language:Shell 1.3%