coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add alloc_async function that takes a single element to copy into all the elements

coreylowman opened this issue · comments

This actually seems a little more subtle than I had hoped - it seems like you can only do memset for up to 32 bits with the driver api? A cleaner thing might just be to allocate host memory and then do take_async

yes, memset only works for primitives, iirc even for 64 bit values (but probably still kinda useless)

could there be an unsafe alloc function to just alloc but not zero the memory?

This actually seems a little more subtle than I had hoped - it seems like you can only do memset for up to 32 bits with the driver api? A cleaner thing might just be to allocate host memory and then do take_async

maybe just take the naive approach and loop through the whole region (using async copy this shouldn’t be too bad)

Not going to do this - if someone really wants to do this they can create their own kernel for setting values like they want (did this a couple of places in dfdx)