shubhampachori12110095 / DiskANN

Scalable graph based indices for approximate nearest neighbor search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DiskANN

The goal of the project is to build scalable, performant, streaming and cost-effective approximate nearest neighbor search algorithms for trillion-scale vector search. This release has the code from the DiskANN paper published in NeurIPS 2019, the streaming DiskANN paper and improvements. This code reuses and builds upon some of the code for NSG algorithm.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

See guidelines for contributing to this project.

Linux build:

Install the following packages through apt-get

sudo apt install make cmake g++ libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev

Install Intel MKL

Ubuntu 20.04

sudo apt install libmkl-full-dev

Earlier versions of Ubuntu

Install Intel MKL either by downloading the oneAPI MKL installer or using apt (we tested with build 2019.4-070 and 2022.1.2.146).

# OneAPI MKL Installer
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18487/l_BaseKit_p_2022.1.2.146.sh
sudo sh l_BaseKit_p_2022.1.2.146.sh -a --components intel.oneapi.lin.mkl.devel --action install --eula accept -s

Build

mkdir build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make -j 

Windows build:

The Windows version has been tested with Enterprise editions of Visual Studio 2022, 2019 and 2017. It should work with the Community and Professional editions as well without any changes.

Prerequisites:

  • CMake 3.15+ (available in VisualStudio 2019+ or from https://cmake.org)
  • NuGet.exe (install from https://www.nuget.org/downloads)
    • The build script will use NuGet to get MKL, OpenMP and Boost packages.
  • DiskANN git repository checked out together with submodules. To check out submodules after git clone:
git submodule init
git submodule update
  • Environment variables:
    • [optional] If you would like to override the Boost library listed in windows/packages.config.in, set BOOST_ROOT to your Boost folder.

Build steps:

  • Open the "x64 Native Tools Command Prompt for VS 2019" (or corresponding version) and change to DiskANN folder
  • Create a "build" directory inside it
  • Change to the "build" directory and run
cmake ..

OR for Visual Studio 2017 and earlier:

<full-path-to-installed-cmake>\cmake ..
  • This will create a diskann.sln solution. Open it from VisualStudio and build either Release or Debug configuration.
    • Alternatively, use MSBuild:
msbuild.exe diskann.sln /m /nologo /t:Build /p:Configuration="Release" /property:Platform="x64"
* This will also build gperftools submodule for libtcmalloc_minimal dependency.
  • Generated binaries are stored in the x64/Release or x64/Debug directories.

Usage:

Please see the following pages on using the compiled code:

About

Scalable graph based indices for approximate nearest neighbor search

License:Other


Languages

Language:C++ 97.4%Language:CMake 2.1%Language:Shell 0.4%Language:Dockerfile 0.0%Language:C 0.0%