facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

Home Page:https://faiss.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

training detail about IVFPQ with GPU

Hardcandies opened this issue · comments

Summary

According to https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU#using-multiple-gpus, I know that faiss will replicate dataset to all GPU by default. But I want to know something about IVFPQ training time:

  1. When copying data to all GPUs, which GPU will Faiss use for training? If all GPU data will be trained, which index will be used in the end?
  2. When the data is divided equally among different GPUs, does clustering only happen on part of the data on the GPU?And again, which index will be used in the end?

Platform

OS: linux

Faiss version: faiss-1.6.2

Running on:

  • GPU

Interface:

  • Python

On multiple GPUs, the indexes will become independent so the training will be duplicated.
There are ways to avoid that, but the default behavior is that.

So when the data is divided among different GPUs, each index will only be trained on the part of all data?

Nope, all indexes get the same training data. Since training is reproducible, they will get the same training results.

So when I use mode "IndexShards", all indexes still get the same training data?