coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Are there any plans to streamline compiling kernel source codes?

l3utterfly opened this issue · comments

I've been using CUDA currently by writing a small C library exposing functions/structs/methods etc. that I need and linking statically to my Rust project. It is cumbersome, but has the advantage that I can use tools like "bindgen" to generate struct/function representations automatically, so I don't need to write struct definitions in two places.

Are there any plans to streamline compiling the kernel code so perhaps structs are automatically generated on the rust side, or the CUDA side?

just to clarify, do you mean the kernel structs?

For example, in the device_repr example, you have structs that the kernel needs as input. Those are defined two times, once in the kernel code, once in Rust. It would be nice if we can somehow define the struct in one place (either on the C or Rust side), and generate the other side.

Additionally, if I'm not mistaken, currently there's two methods to read kernel code:

  1. read at runtime (either by reading a pre-generated ptx code, or reading the cu code directly and compiling)
  2. directly write the kernel code as a string in Rust, then compiling

(1) means the kernel code will have to be distributed along with the exe, which might not be desirable
(2) makes is hard to maintain: there's no intelli-sense when the code is a Rust const &str. Perhaps I can use the macro !include_str to include my code written in CU files, but that still feels really wonky.

hmm, @coreylowman has a better overview of this crate than me lol. but this crate is something like a dfdx-sys crate, which just adds bindings to cuda and cudnn. also, using include_str is fine for smaller use cases and you can have a look at how dfdx handles this.

Playing around with the library a bit more, I've settled on the following workflow for writing kernels, generating bindings, compiling CUDA, and last but not least, using this library to call the kernels.

  1. Write CUDA kernels in separate folder/project. They can be as complicated as you need them to be: multiple files, include headers, call functions etc.
  2. Create a wrapper.h including all the struct definitions used in your kernels
  3. In the Rust project, write a build script that does the following:
    a. Build the kernels (.cu files) by invoking the nvcc command with -ptx flag. Output them in the build directory
    b. Use bindgen to generate structs on the Rust side
    c. Use regex to replace any pointers (e.g. *mut f32) with CudaSlice<f32>
    d. Save the edits to the generated binding.rs and output to build directory
  4. Include the bindings in your Rust project
  5. Compile PTX using the built-in functions in this crate (cudarc)
  6. After all this is setup, calling your kernels and allocating memory for your structs are all covered by this crate's examples

Thinking about this a bit more, I can see most of this falls outside of the scope of this crate. Perhaps this can be included as an example for new users? This workflow took me quite a bit of time to figure out (since I'm new to CUDA).

Ngl, I was almost turned off of this crate due to no straightforward way of compiling kernel code apart of pasting them in the Rust code as a string, which is a terrible way to include code in general.

I'm happy to write this up as an example if you feel this can help your crate users.

3. Use regex to replace any pointers (e.g. *mut f32) with CudaSlice<f32>

be careful if you use other pointer types

What other are there?

I was thinking just replacing *mut f32 -> CudaSlice<f32>, *mut i32 -> CudaSlice<i32> etc.

hmm, not sure, but you are definitely missing *const _ -> &CudaSlice<_>.

Hey thanks for sharing this! Yeah all the steps you listed are accurate as far as compiling kernels. The dfdx build script looks fairly similar to the steps you listed (although without the extra bindgen stuff).

I think including a complex build script would be a good addition, any more complex use cases will need to figure out build scripts eventually.

Another way to do all this is to use nvcc in build script as you've done, and then include the compiled PTX source in the rust code with const PTX_SRC: &str = include_str!(concat!(env!("OUT_DIR"), "/<name of file>.ptx"));. I detail this a bit more in this blog post: https://coreylowman.github.io/2023/04/07/cudarc-stack.html#at-compile-time-with-nvcc

Have you experimented with that at all?

I totally agree that streamlining this would be a huge win. It's very complicated at the moment

Another way to do all this is to use nvcc in build script as you've done, and then include the compiled PTX source in the rust code with const PTX_SRC: &str = include_str!(concat!(env!("OUT_DIR"), "/.ptx"));. I detail this a bit more in this blog post: https://coreylowman.github.io/2023/04/07/cudarc-stack.html#at-compile-time-with-nvcc

Yep! That's exactly what I did. Glad we arrived at the same solution.

Awesome, yeah if you have any other ways to improve this definitely let me know 😀 An example in this repo would be great

Sure, will write something up. I'll create a pull request so you can review it when ready.

Closing as the PR was merged, thanks for opening that!