RyanMarcus / PGM-index

πŸ… A state-of-the-art data structure that enables fast point and range searches in arrays of billions of items using orders of magnitude less space than traditional indexes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The PGM-index

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast point and range searches in arrays of billions of items using orders of magnitude less space than traditional indexes.

Website | Documentation | Paper | AΒ³ Lab

Travis (.org) License GitHub stars

Building the code

To download and build the library use the following commands:

git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-index
git submodule update --init --recursive
cmake . -DCMAKE_BUILD_TYPE=Release
make -j8

Now you can run the unit tests via:

./test/tests

Minimal example

#include <vector>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include "pgm_index.hpp"

int main(int argc, char **argv) {
    // Generate some random data
    std::vector<int> dataset(1000000);
    std::generate(dataset.begin(), dataset.end(), std::rand);
    dataset.push_back(42);
    std::sort(dataset.begin(), dataset.end());

    // Construct the PGM-index
    const int error = 128;
    PGMIndex<int, error> index(dataset);

    // Query the PGM-index
    auto q = 42;
    auto approx_range = index.find_approximate_position(q);
    auto lo = dataset.cbegin() + approx_range.lo;
    auto hi = dataset.cbegin() + approx_range.hi;
    std::cout << *std::lower_bound(lo, hi, q);

    return 0;
}

License

This project is licensed under the terms of the GNU General Public License v3.0.

If you use the library please put a link to the website and cite the following paper:

@misc{FerraginaVinciguerra:2019,
  Archiveprefix = {arXiv},
  Author = {Paolo Ferragina and Giorgio Vinciguerra},
  Day = {14},
  Eprint = {1910.06169},
  Month = {10},
  Primaryclass = {cs.DS},
  Title = {The PGM-index: a multicriteria, compressed and learned approach to data indexing},
  Url = {https://arxiv.org/abs/1910.06169},
  Year = {2019}}

About

πŸ… A state-of-the-art data structure that enables fast point and range searches in arrays of billions of items using orders of magnitude less space than traditional indexes

License:GNU General Public License v3.0


Languages

Language:C++ 83.8%Language:HTML 16.2%Language:CMake 0.0%Language:C 0.0%