facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

Home Page:https://faiss.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Faiss training on GPU crash because number of IVF centroids changes in the middle of the training

JeanBaptiste-dlb opened this issue · comments

Summary

Hello and thanks in advance for the help.

I encountered a bug with faiss-gpu 1.7.2 in the training of index using GPU.
The problem is at some point during the training, the number of centroids of the inverted file index changes which leads to a matrix multiplication error and terminate the programs.

Platform

OS: Ubuntu 20.04.5 LTS x86_64
GPU: NVIDIA 01:00.0 NVIDIA Corporation Device 2203
(RTX 4090)
Driver Version: 515.65.01 CUDA Version: 11.7

Faiss version: 1.7.2

Installed from: poetry (pyPI)

Faiss compilation options: unknown

Running on:

  • CPU
  • [ * ] GPU

Interface:

  • C++
  • [ * ] Python

Reproduction instructions

https://gist.github.com/JeanBaptiste-dlb/a3aa1f93e2b247f61a9a83e5dfc0fb55

logs:

WARNING clustering 600 points to 256 centroids: please provide at least 9984 training points
WARNING clustering 600 points to 256 centroids: please provide at least 9984 training points
WARNING clustering 600 points to 512 centroids: please provide at least 19968 training points
Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 64) x (512, 64)' = (512, 512) gemm params m 512 n 512 k 64 trA T trB N lda 64 ldb 64 ldc 512

Please install via anaconda.