coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discovering the cuda libraries with dynamic loading

LaurentMazare opened this issue · comments

Hello,
First thanks for all the great work on this crate, we rely heavily on it in candle and it has worked very well.
We recently updated candle to the 0.11 release including the dynamic loading change in #197 and some users got into some issues with locating libcuda.so , e.g. here huggingface/candle#2175
I've just tweaked candle to be back at dynamic linking rather than dynamic loading as a quick fix but I'm wondering how dynamic loading is supposed to work if the cuda libraries are not in the user LD_LIBRARY_PATH (or in a system lib dir). When using dynamic linking, the RUNPATH is set appropriately in the binary so that the shared libraries will be found when the binary is launched, however in the loading case I'm not sure if there is an equivalent.

Linking to #219, as they may be related based on the comments in the candle thread. In addition to libraries not being in LD_LIBRARY_PATH, they may also be named differently (e.g. windows does nvcuda.dll for the shared library version for some reason instead of cuda.dll).

I think we could potentially add some standard cuda install locations to LD_LIBRARY_PATH manually inside the cudarc crate at runtime. Otherwise I'm not quite sure how we'd handle the libraries not being on path.

We'll likely need to add some workarounds for weird dll names on windows as well.

I've just tweaked candle to be back at dynamic linking rather than dynamic loading as a quick fix

Hopefully it was as easy as enabling the dynamic-linking flag to swap back to link case?

Also maybe there is a 3rd issue considering the user in the candle thread added path to LD_LIBRARY_PATH and still wasn't able to load the library

Hopefully it was as easy as enabling the dynamic-linking flag to swap back to link case?

This was pretty easy, though by default we prefer not having cudnn and nccl on as some users might have it not enabled. So to avoid issues at compile time I deactivated the default features.
This end up being a bit tricky for users that want to use both candle and another crate that depends on cudarc with the default features, hopefully that's not a very common use case.