unum-cloud / usearch-benchmarks

Comparing USearch to FAISS and other Vector Search engines on Billion-scale datasets

Home Page:https://www.unum.cloud/blog/2023-11-07-scaling-vector-search-with-intel

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

USearch Benchmarks

This set of benchmarks is meant to test USearch capabilities for Billion-scale vector search. It provides an alternative to the ann-benchmarks and the big-ann-benchmarks which generally operate on much smaller collections.

The main objective is to understand the scaling laws of the USearch compared to FAISS. Supplementary adapters for other popular systems is also available under index/ directory:

  • Alternative HNSW implementations, like HNSWlib,
  • Alternative CPU-based libraries, like SCANN,
  • Vector Databases, like Qdrant, and Wevaite.

The primary dataset used for benchmarks is the Deep1B dataset of 1 Billion 96-dimensional vectors, totalling at 384 GB. Ground-truth nearest neighbors are provided to calculate the recall metrics.

Setup

First of all, we recommend creating a conda environment to isolate the dependencies:

conda create -n usearch-benchmarks python=3.10
conda activate usearch-benchmarks

Then install dependencies, getting an MKL-accelerated version of FAISS library.

pip install usearch hnswlib scann lancedb qdrant-client weaviate-client psutil plotly kaleido
conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl

To benchmark Qdrant, you need to run their Docker container:

docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant

Finally, download the Deep1B dataset:

wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P data
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P data # For smaller subset

To run the ANN benchmarks pass a configuration file:

python run.py configs/usearch_1B.json 1B # Outputs stats/*.npz file
python utils/draw_plots.py # Exports tp plots/*.png

About

Comparing USearch to FAISS and other Vector Search engines on Billion-scale datasets

https://www.unum.cloud/blog/2023-11-07-scaling-vector-search-with-intel


Languages

Language:Python 90.9%Language:Jupyter Notebook 8.1%Language:Shell 0.9%