coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the correct way to GPU execution before reclaiming data?

l3utterfly opened this issue · comments

Following the launch_kernel example, it launches a kernel then uses "sync reclaim" to copy the results back to the CPU.

Do I need to call the synchronise method after calling the kernel? Because in C, this is required. None of the examples have this, just making sure.

Nope!

All the device to host copy methods are synchronize (they call synchronize internally). I attempted to make this clear by including sync in all the method names that are inherently synchronous.

There are the following ways of copying data back to host:

  1. CudaDevice::sync_reclaim, which deallocates the device memory and returns the vec
  2. CudaDevice::dtoh_sync_copy, which synchronously copies into a newly allocated vec
  3. CudaDevice::dtoh_sync_copy_into which synchronously copies into an existing slice

I will note that if you are using a separate CudaStream to launch kernels, you will need to explicitly synchronize the stream with CudaDevice::wait_for, which is shown in the examples/04-stream.rs

Thanks, got it!