What is the correct way to GPU execution before reclaiming data?
l3utterfly opened this issue · comments
Following the launch_kernel
example, it launches a kernel then uses "sync reclaim" to copy the results back to the CPU.
Do I need to call the synchronise
method after calling the kernel? Because in C, this is required. None of the examples have this, just making sure.
Nope!
All the device to host copy methods are synchronize (they call synchronize internally). I attempted to make this clear by including sync
in all the method names that are inherently synchronous.
There are the following ways of copying data back to host:
- CudaDevice::sync_reclaim, which deallocates the device memory and returns the vec
- CudaDevice::dtoh_sync_copy, which synchronously copies into a newly allocated vec
- CudaDevice::dtoh_sync_copy_into which synchronously copies into an existing slice
I will note that if you are using a separate CudaStream to launch kernels, you will need to explicitly synchronize the stream with CudaDevice::wait_for, which is shown in the examples/04-stream.rs
Thanks, got it!