elanmart / mips

Library for fast classification in problems with large number of classes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

This repository accompanies an engineering dissertation prepared by Marcin Elantkowski, Adam Krasuski, Franciszek Walkowiak and Agnieszka Lipska.

Its main focus is a maximum inner product search problem -- we've implemented several algorithms and utilities to perform efficient nearest-neighbour search according to the inner-product metric.

We've implemented three algorithms as described in

  • Clustering is efficient for approximate maximum inner product search (Auvolat, Larochelle, Chandar, Vincent, Bengio)
  • Quantization based fast inner product search (Guo, Kumar, Choromanski, Simcha)
  • Asymmetric LSH for sublinear maximum inner product search (Shrivastava, Li)

The hierarchical k-means implementation of Auvolat et. al is quite fast (faster than FAISS), the other two not so much.

This repository also provides some examples of how you can incorporate an index in your MIPS-bounded code.

Compilation

python-only

To use only the python utils, you can just run

conda install -c pytorch faiss-cpu
python setup.py install

python + our c++ indexes

To build the C++ code in this repo, you'll also need to build FAISS. To build on Ubuntu-16.04 with openblas isntalled (sudo apt install libopenblas-dev) all you have to do is

conda install -c conda-forge pybind11
git clone --recursive https://github.com/elanmart/mips
make
python setup.py install

If you're on a different platform, you'll need to adjust makefile.inc according to instructions in faiss/INSTALLATION

You can also use a different BLAS implementation, but for mkl the compilation is a real pain.

Examples

See python/examples for some examples.

Misc

The FastText fork used in our expriments can be found at https://github.com/elanmart/fastText

About

Library for fast classification in problems with large number of classes

License:MIT License


Languages

Language:C++ 56.5%Language:Jupyter Notebook 22.3%Language:Python 19.8%Language:Makefile 1.1%Language:C 0.3%