Embedding generation runs on CPU only

Question

Embedding generation runs on CPU only

rti opened this issue 6 months ago · comments

Terms

I have searched all open bug reports
I agree to follow Wikimedia's Code of Conduct

Behavior

When generating embeddings, only CPU is used, no GPU acceleration is leveraged.
This makes embedding generation for our full example data requiring 18h on 16 cores.
Typically, GPU acceleration can be activated by providing a device="cuda" parameter. This should speed up the embedding generation.

Operating System

linux, our container on runpod.io host with nvidia 3090

rti · Answer 1 · Tue Feb 13 2024 00:44:10 GMT+0800 (China Standard Time)

fixed in 9ee8a32

rti · Answer 2 · Tue Feb 13 2024 00:44:42 GMT+0800 (China Standard Time)

part of #23