Soundness issues (3)

Question

Soundness issues (3)

Narsil opened this issue a year ago · comments

And another one. I am opening a new issue since I'm not quite sure they are related (one is during the drop, the other during normal usage).

let dev0 = CudaDevice::new(0).unwrap();
let dev1 = CudaDevice::new(1).unwrap();
let slice = dev0.htod_copy(vec![1.0; 10]).unwrap();
let out = dev0.dtoh_sync_copy(&slice).unwrap();

This panicks with error:

thread 'tests::dummy' panicked at 'called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_INVALID_VALUE, "invalid argument")', src/lib.rs:103:47
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
   4: cudarc::tests::dummy
             at ./src/lib.rs:103:19
   5: cudarc::tests::dummy::{{closure}}
             at ./src/lib.rs:99:16
   6: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

My understanding is that since Device::new(1) has been created, cuda global context is actually now targetting device 1, meaning the device_ptr for slice is actually invalid.

The only "simple" fix I see is protecting every safe operation by result::ctx::set_current(cu_primary_ctx)?;.

Is that correct ?

If that's the case, wouldn't a sort of Mutex::lock() be more effective at preventing this kind of issue ?

Happy to try and provide PRs for fixes.

Related: #160
#108

Corey Lowman · Answer 1 · Thu Jul 06 2023 21:21:23 GMT+0800 (China Standard Time)

The only "simple" fix I see is protecting every safe operation by result::ctx::set_current(cu_primary_ctx)?;. Is that correct ?

Yes, making this fix resolves both this panic and #160.