coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to allocate memory on the GPU for a single value and reclaim it after a kernel call?

l3utterfly opened this issue · comments

For example, I have a simple CUDA kernel which counts non-zero elements. It will not be run in parallel, only on one thread:

__global__ void countNonZeroElements(const float *input, int input_length, int *non_zero_count) {
    int count = 0;
    for (int i = 0; i < input_length; i++) {
        if (input[i] != 0) {
            count++;
        }
    }
    *non_zero_count = count;
}

The I want to read the non_zero_count variable after calling the kernel function.

I only see ways to allocate and reclaim a CudaSlice.

You could just allocate a cuda slice with one value.

let mut non_zero_count = dev.alloc_zeros::<i32>(1);

There's no way to pass pointers to primitive types to cuda kernels atm for a few reasons:

  1. It'd have to be in some shared host/gpu memory
  2. It'd be unsound because you could mutate the primitive on rust side while the kernel is running. This isn't possible with CudaSlice (unless you're using CudaStream improperly) because all the kernels are executed sequentially on a single stream.

Sure thing, closing for now 👍