What is the correct way to GPU execution before reclaiming data?

Question

What is the correct way to GPU execution before reclaiming data?

l3utterfly opened this issue a year ago · comments

Following the launch_kernel example, it launches a kernel then uses "sync reclaim" to copy the results back to the CPU.

Do I need to call the synchronise method after calling the kernel? Because in C, this is required. None of the examples have this, just making sure.

Corey Lowman · Answer 1 · Fri Apr 21 2023 21:59:20 GMT+0800 (China Standard Time)

Nope!

All the device to host copy methods are synchronize (they call synchronize internally). I attempted to make this clear by including sync in all the method names that are inherently synchronous.

There are the following ways of copying data back to host:

CudaDevice::sync_reclaim, which deallocates the device memory and returns the vec
CudaDevice::dtoh_sync_copy, which synchronously copies into a newly allocated vec
CudaDevice::dtoh_sync_copy_into which synchronously copies into an existing slice

Corey Lowman · Answer 2 · Fri Apr 21 2023 22:01:05 GMT+0800 (China Standard Time)

I will note that if you are using a separate CudaStream to launch kernels, you will need to explicitly synchronize the stream with CudaDevice::wait_for, which is shown in the examples/04-stream.rs

l3utterfly · Answer 3 · Fri Apr 21 2023 22:24:45 GMT+0800 (China Standard Time)

Thanks, got it!