Soundness issue (2)
Narsil opened this issue · comments
Hey I discovered another potential serious issue:
let dev0 = CudaDevice::new(0).unwrap();
let slice = dev0.htod_copy(vec![1.0; 10]).unwrap();
let dev1 = CudaDevice::new(1).unwrap();
drop(dev1);
drop(dev0);
drop(slice);
This panicks with the following stacktrace:
thread 'tests::dummy' panicked at 'called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_CONTEXT_IS_DESTROYED, "context is destroyed")', src/driver/safe/core.rs:152:72
stack backtrace:
0: rust_begin_unwind
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
1: core::panicking::panic_fmt
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
2: core::result::unwrap_failed
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
3: core::result::Result<T,E>::unwrap
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
4: <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop
at ./src/driver/safe/core.rs:152:13
5: core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<f64>>
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ptr/mod.rs:490:1
6: core::mem::drop
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/mem/mod.rs:979:24
7: cudarc::tests::dummy
at ./src/lib.rs:107:9
8: cudarc::tests::dummy::{{closure}}
at ./src/lib.rs:99:16
9: core::ops::function::FnOnce::call_once
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
10: core::ops::function::FnOnce::call_once
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
IIUC this is because the stream here : https://github.com/coreylowman/cudarc/blob/main/src/driver/safe/core.rs#L189 is already gone.
Somehow this doesn't fail if I don't create CudaDevice::new(1)
.
Might be linked to : #108
I wonder if this is a misuse of the primary context api? Currently the drop for CudaDevice calls cuDevicePrimaryCtxRelease. However, in multi device setting, the CudaDevice's cu_primary_ctx (maybe should just be named cu_ctx) won't necessarily be the primary context anymore?
edit: Actually I guess each device (0 and 1) would have their own separate primary contexts? 🤔
Noting that CudaDevice::new(0).unwrap()
will return an Arc<CudaDevice>
, and when you first call drop(dev0)
, the underlying CudaDevice
won't be dropped yet because it is cloned in the CudaSlice (so the refcount before the first drop is 2, then after the drop it goes down to 1).
So I believe the stream should still exist?
Okay yeah I think this is related to #161 - once you call result::ctx::set_current() after creating the second device, when the slice is freed in drop(slice), the free_async assumes you are using the most recent cuda context, which will be the 2nd.