coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Providing an option to use a non-default stream

michaeleisel opened this issue · comments

I'm trying to improve the performance of my candle models via CUDA streams, which I've benchmarked to be helpful. However, I want to be sure that this is safe to do, at least with the restrictions I've imposed on myself (cudarc Device and candle Tensor objects may only be accessed from the thread in which they were created, and candle Tensors may only perform operations with other Tensors of the same Device instance). The problem is that cudarc always uses the default legacy stream. I'd love to instead have an option to use the old behavior of creating a new stream for each new device, if it'd be safe to do so.

This should already be possible via CudaStream. Though you have to manage the additional stream yourself instead of cudarc managing it for you.

If I want to run, say, CudaDevice::memset_zeros(), I don't see any way to do this without using the default legacy stream to do it, because I don't see any options to customize the stream.

Ah gotcha, you're right. Would you want an option on creation of the CudaDevice to use a non-null stream?

I think that could be an interesting way to do it, yeah. My one concern would be if there are any APIs, like candle, that assume that if two devices have the same ordinal, then their operations are on the same stream. But, maybe that would be considered overreliance on implementation details. For candle, it looks like they use a stricter form of checking with a generated unique ID rather than the ordinal, so there wouldn't be any problems with that approach for that library.