gaoj0017 / ADSampling

[SIGMOD 2023] High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[SIGMOD 2023] High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

We note that we have included detailed comments of our core algorithms in

  • ./src/adsampling.h
  • ./src/hnswlib/hnswalg.h
  • ./src/ivf/ivf.h

Prerequisites


GIST Reproduction

The tested datasets are available at https://www.cse.cuhk.edu.hk/systems/hash/gqr/datasets.html.

  1. Download and preprocess the datasets. Detailed instructions can be found in ./data/README.md.

  2. Index the datasets. It could take several hours.

    # Index IVF/IVF+/IVF++
    ./script/index_ivf.sh
    
    # Index HNSW/HNSW+/HNSW++
    ./script/index_hnsw.sh
  3. Test the queries of the datasets. The results are generated in ./results/. Detailed configurations can be found in ./script/README.md.

    # Index IVF/IVF+/IVF++
    ./script/search_ivf.sh
    
    # Index HNSW/HNSW+/HNSW++
    ./script/search_hnsw.sh

About

[SIGMOD 2023] High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations


Languages

Language:C++ 95.7%Language:Python 2.4%Language:Shell 1.9%