ByteHamster / LeMonHash

Learned Monotone Minimal Perfect Hashing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learned Monotone Minimal Perfect Hashing

Logo

LeMonHash is a Monotone Minimal Perfect Hash function that combines the PGM-Index for space-efficient ranking and BuRR for low-overhead retrieval.

A variant for variable-length strings provides significantly faster queries than competitors.

Screenshot of measurements in paper

Requirements

Usage

Clone the repository (as a submodule) and add the following to your CMakeLists.txt.

add_subdirectory(path/to/LeMonHash)
target_link_libraries(YourTarget PRIVATE LeMonHash)

Then you can use the straight-forward interface of LeMonHash:

std::vector<uint64_t> inputData {0, 1, 7, 15, 23, 42, 250};
lemonhash::LeMonHash<> hashFunc(inputData);
for (uint64_t x : inputData) {
    std::cout << x << ": \t" << hashFunc(x) << std::endl;
}

License

This code is licensed under the GPLv3. If you use the project in an academic context or publication, please cite our paper:

@inproceedings{DBLP:conf/esa/FerraginaL0V23,
  author       = {Paolo Ferragina and
                  Hans{-}Peter Lehmann and
                  Peter Sanders and
                  Giorgio Vinciguerra},
  title        = {Learned Monotone Minimal Perfect Hashing},
  booktitle    = {{ESA}},
  series       = {LIPIcs},
  volume       = {274},
  pages        = {46:1--46:17},
  publisher    = {Schloss Dagstuhl - Leibniz-Zentrum f{\"{u}}r Informatik},
  year         = {2023},
  doi          = {10.4230/LIPICS.ESA.2023.46}
}

The code of the experiments comparing LeMonHash to competitors from the literature is available here.

About

Learned Monotone Minimal Perfect Hashing

License:GNU General Public License v3.0


Languages

Language:C++ 98.2%Language:CMake 0.9%Language:Shell 0.9%