Dynamic loading instead of linking

Question

Dynamic loading instead of linking

fayalalebrun opened this issue 7 months ago · comments

Francisco Ayala Le Brun commented 7 months ago

Currently a binary depending on cudarc will fail to run when nvidia drivers are not installed. This means that programs which offer optional CUDA support need to distribute at least two binaries. The driver is also necessary for building the program, since otherwise the linking step will fail.

An alternative is to perform dynamic loading with a crate like libloading. bindgen already offers support for generating the required structures. In this way, the library will only be loaded when a function depending on CUDA is used, and the nvidia driver will no longer be required for building the binary.

Do you think this would be a desirable change? And do you see any point of still supporting strict dynamic linking if dynamic loading is supported?

Corey Lowman · Answer 1 · Fri Mar 22 2024 01:25:47 GMT+0800 (China Standard Time)

I'm going to work on adding this next - have become very interested in this feature. The main thing we'll need to figure out is if we still want to support dynamic & static linking in addition to dynamic loading.

I'm leaning towards supporting all three, but that might become very complex

Corey Lowman · Answer 2 · Fri Mar 22 2024 01:28:17 GMT+0800 (China Standard Time)

@Narsil do you have any thoughts here? Curious how this could benefit candle

Jed Brown · Answer 3 · Fri Mar 22 2024 01:43:23 GMT+0800 (China Standard Time)

It's quite common in the HPC world that interactive nodes (where you usually compile) do not have GPUs nor the drivers, but the compute nodes do. This is fine when just using libcudart.so because it loads the driver dynamically, and one can use $CUDA_DIR/lib/stubs/ if your code calls the driver interface directly. I think similar/better usability is desirable for Rust, especially if it means we can distribute a static binary that can run on CPU machines without any CUDA libraries or devices, but can use devices when selected by run-time options on a machine that has them.

Corey Lowman · Answer 4 · Mon Mar 25 2024 23:00:14 GMT+0800 (China Standard Time)

Update on static linking: I think we don't necessarily need this feature anymore. I was able to run cargo test inside a docker image (one of the official nvidia ones) with some changes to LD_LIBRARY_PATH.

I think we can keep dynamic linking feature in the build.rs, but still use libloading in both loading/linking cases, and the OS should detect that we've already loaded the library in the dynamic linking case.

Nicolas Patry · Answer 5 · Tue Mar 26 2024 18:15:22 GMT+0800 (China Standard Time)

dynamic linking should still be the default imho as it's the most common way to distribute things (in a package manager you just need to depend on something). The kernel is able to do all the loading and finding things and also has some security model.
You rarely distribute a binary that supports multiple CPU arch. GPU is becoming more and more like a CPU imho in the sense that it's a core target for binaries in ML world.

Static linking is nice when you want to reduce the binary size and may be running on hosts that don't have cuda itself (quite niche but useful when you want to strip everything barebones, definitely removable).

dynamic loading is asking for trouble because of version issues in cuda, you might be loading a wrong version of the lib and making your program buggy without ever realizing.
I'd also like to note that no matter what, your kernels (both dfdx and candle) are built for a specific GPU, therefore binaries are not portable anyway. Building for all possible GPUs will bloat the binaries super fast.
Still a very good option for the mentionned options, but I'm not sure I'd make it default personally.

IMHO binaries/package managers should start checking GPU as a core target for binary selection (and for the time being it's up to devs to do it.)

Corey Lowman · Answer 6 · Thu Mar 28 2024 03:19:35 GMT+0800 (China Standard Time)

Those are all fair points - I'm fine with dynamic linking being default behavior.

I think the only "counterpoint" I have to the above is if all kernels are JIT compiled for machine they are running on, then we have less to worry about. Although JIT compiling everything has other downsides.

Francisco Ayala Le Brun · Answer 7 · Thu Mar 28 2024 04:13:15 GMT+0800 (China Standard Time)

@Narsil Just curious about your point on the risk of loading the wrong version of the lib. How does dynamic linking protect from this better than dynamic loading? I had the idea both of them would be about equal in this regard.