una-dinosauria / Rayuela.jl

Code for my PhD thesis. Library of quantization-based methods for fast similarity search in high dimensions. Presented at ECCV 18.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LSQ++ in 16x4 (nbits=4) - Does NOT scale up to large training sets

k-amara opened this issue · comments

Hello,

I have run LSQ++ for codebook of size M=16 (number of subspaces) and codes encoded in nbits=4 for BigANN1M and Deep1M. When increasing the size of the training set, I observe a drop in recall (@1, @10, @100) for both datasets. Please find attached graphics that illustrate the problem.
Screenshot 2021-08-12 at 09 46 04
Screenshot 2021-08-12 at 09 50 06

I have used for LSQ++ the FAISS implementation (faiss.LocalSearchQuantizer(d, M, nbits)). @mdouze

Have you experienced this issue when testing LSQ++16x4?
I did a gridsearch on niter_train and niter_ils_train but have observed no difference in the drop...

Cheers
@k-amara

Hello @k-amara,

This seems very strange. I remember trying out larger training sets during my PhD, and I did not observe drops in recall -- and definitely not dramatic ones like the ones you've shared.

Does this happen with the implementation in Rayuela too? If it doesn't, then it's probably a bug in the FAISS implementation.

Cheers,