What is the correct way to do device memory manipulation using this library?
l3utterfly opened this issue · comments
I am trying to accomplish the following: append two arrays on device memory.
Normally, the following steps achieves this:
- Allocate a new block of memory with
size = array1 + array2
- Copy all memory from
array1
->newArray
- Copy all memory from
array2
->newArray
starting at dst memory location*newArray + sizeof(array1[0]) * len(array1)
I'm try to achieve the same with this. I've gotten this far:
let raw_dev_ptr1 = cuda_slice1.leak()
let raw_dev_ptr2 = cuda_slice2.leak()
let new_cuda_slice = dev.alloc(total_size)
This allows me to obtain the raw pointer locations to my data on device.
I'm looking into the function cudarc::driver::result::memcpy_dtod_async
, it seems to be what I need. But I have trouble getting the Stream
reference.
I see the CudaDevice
has a private stream
field, which seems to be what I need. But it's private.
What is the correct way to access the stream
using this library?
You can use CudaDevice::dtod_copy for this, combined with CudaSlice::slice_mut:
let a = dev.alloc_zeros::<f32>(5);
let b = dev.alloc_zeros::<f32>(5);
let mut c = dev.alloc_zeros::<f32>(10);
dev.dtod_copy(&a, &mut c.slice_mut(0..5));
dev.dtod_copy(&b, &mut c.slice_mut(5..10));
As far as accessing stream, this isn't possible now, but definitely open to making the field public. Need to think more about mixing safe/result APIs. In general all the methods from result have a safe counterpart, so you should be able to use safe api for everything.
Will close this for now, but if you want to add this to examples/02-copy.rs
, that'd probably be useful for future!