Use default stream instead of a new stream per CudaDevice
coreylowman opened this issue · comments
Corey Lowman commented
Instantiating multiple devices creates multiple streams, and there is tricky synchronization problems between them.
Max Obreiter commented
is it generally a problem to synchronize multiple devices?
Max Obreiter commented
and isn't the default stream non-async (but only for non-explicit async ops?)
Corey Lowman commented
By device in description I meant cudarc::CudaDevice. Each of these will have its own stream at the moment