[FEA] IVF index building with pinned H2D transfer

Question

[FEA] IVF index building with pinned H2D transfer

tfeher opened this issue 4 months ago · comments

Is your feature request related to a problem? Please describe.
For IVF-Flat ad IVF-PQ index building, large datasets are provided in host memory or as mmap-ed file. After the cluster centers are trained, both method streams through the whole dataset twice. Currently there is no overlap between host to device copies and additional data processing on the GPU.

Describe the solution you'd like
Use pinned buffers to copy the data to the GPU and overlap it with GPU side computation.

Additional context

Since the dataset can be larger than the physical (host) memory of the system, it is not possible to load the whole dataset into pinned memory.
Index subsampling already use pinned buffers to overlap vector gathering and H2D copies 5485557

IVF-Flat and IVF-PQ streams through the whole dataset here:

assign vectors to cluster centers (k-means predict): IVF-Flat, IVF-PQ
copy vectors to their respective cluster (additionally encode vectros and map to a specific layout): IVF-Flat, IVF-PQ

We use batch_load_iterator to copy the data to host. Ideally, we could improve the batch load-iterator to prefetch the data into a pinned buffer.

Tamas Bela Feher · Answer 1 · Mon Jan 22 2024 17:05:04 GMT+0800 (China Standard Time)

Tagging @abc99lr who plans to work on this, and @achirkin for visibility.