nearest-neighbor-search gpu parallel-computing high-performance-computing large-scale

Efficient Large-scale Approximate Nearest Neighbor Search on the GPU

This repository contains the implementation of Product Quantization Trees (PQT) for large scale nearest neighbor search.

@inproceedings{PQT,
    author = "Patrick Wieschollek and Oliver Wang and Alexander Sorkine-Hornung and Hendrik P.A. Lensch",
    title = "Efficient Large-scale Approximate Nearest Neighbor Search on the GPU",
    booktitle = "IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
    pages = "",
    month = "June",
    year = "2016",
    url = "goo.gl/4Zl5xB"
}

This experimental implementation is the first running on the GPU outperforming previous CPU approaches.

Usage

Download

Just by

git clone --recursive https://github.com/cgtuebingen/Product-Quantization-Tree.git

Requirements

To build this project you need

cmake
A C++11 capable compiler
The CUDA toolkit and the CUDA samples.
GFlags and GoogleTesting (as git submodules)

We use Toolkit v7.5, gcc 4.8.4 on Ubuntu 14.04 for Nvidia Titan X.

Preprocessing

To handle datasets and efficiently read them from (ram-)disk we converted the official datasets from SIFT1M, SIFT1B in our own format:

Each *.umem, *.imem, *.fmem has the following layout (header and content)

uint number_of_vectors
uint dimen_of_vector
... header are 20 bytes, next data start at byte 20:
T consecutive array of data, each entry is a T

Examples

* .umem: `uint  uint 0 0 ... 0 0 uint8_t uint8_t uint8_t uint8_t uint8_t uint8_t ...  uint8_t`
* .imem: `uint  uint 0 0 ... 0 0 int int int int int int ...  int`
* .fmem: `uint  uint 0 0 ... 0 0 float float float float float float ...  float`

We provide script to convert these datasets

./convert_fvecs --fvecs src/path/to/db.fvecs --umem dst/path/to/db.umem
./convert_bvecs --bvecs src/path/to/db.bvecs --umem dst/path/to/db.umem
./convert_ivecs --ivecs src/path/to/db.ivecs --imem dst/path/to/db.imem

Query (offline phase)

Building the index structure (Product-Quantization-Tree) is done by tool_createdb. This creates the index structure, prepares the database and dump all intermediate values into binary files.

Possible Flags (see ./tool_createdb -h):

- device    "selected cuda device"
- c1        "number of clusters in first level"
- c2        "number of refinements in second level"
- p         "parts per vector"
- dim       "expected dimension for each vector"
- lineparts "vectorparts for reranking informations"
- chunksize "number of vectors per chunk"
- hashsize  "maximal number of bins"
- basename  "prefix for generated data"
- dataset   "patch to vector dataset"

Query (online phase)

The query process is done batchwise using tool_query. The accompanying example return the best and second best found vector. For possible Flags see ./tool_query -h.

About

GPU-based large scale Approx. Nearest Neighbor Search, accepted at CVPR 2016

http://goo.gl/YDwCl4

nearest-neighbor-search gpu parallel-computing high-performance-computing large-scale

MIT License

Languages

Language:Cuda 63.1%Language:C++ 35.4%Language:MATLAB 0.7%Language:CMake 0.6%Language:Shell 0.1%