Is there a way to use Faiss for efficient Incremental PCA on GPU?

Question

Is there a way to use Faiss for efficient Incremental PCA on GPU?

ofirshifman opened this issue a year ago · comments

Summary

I want do perform linear dimension reduction (PCA) on all of ImageNet train set. it means I want to take 1.2 million photos (say of size 250X250X3) and reduce it to dimension 500 or so. The problem is I can't fit the input data in to the memory of nether CPU nor GPU. scikit-learn offers doing it with IncrementalPCA, and I was wondering if Faiss supports such procedure on GPU or CPU which saves running time?

The only thing I found is faiss.PCAMatrix, but I couldn't find of to train it in an incremental way, using some data loader that loads batches to the memory.

Eventually I want to be able to learn the dimension reduction matrix of size (250 * 250 * 3) X 500, and than i'll use it in the same batch-based way to reduce the data dimension.

Is there something like that in Faiss?
Thanks!

Running on:

[V] CPU
[V] GPU

Interface:

C++
[V] Python

Matthijs Douze · Answer 1 · Fri Jan 20 2023 21:34:42 GMT+0800 (China Standard Time)

It is not implemented in Faiss because it is overkill to use 1.2M vectors to compute a PCA in dimension 500. 5000 vectors is
more than enough for that.
This being said, to compute the PCA you just need the mean vector and the covariance matrix of the vectors. so it is easv to
do that in an incremental wav.