Double free bug causes SIGABRT or SIGSEGV in multi-threaded situations
aabizri opened this issue · comments
EDIT
When testing for the error, I didn't correctly check that the error didn't come from the implementation (I didn't correctly switch to intel-ocl-sdk
when I though I did). After trying again I didn't have the error on intel-ocl-sdk
so it seems that it is a beignet
bug, I will be reporting it there. Closing the issue.
Summary
On ocl 0.19
, when trying to build a Context
(or a ProQue
) in two concurrent threads, a SIGABRT double free
or SIGSEGV
error is triggered. On a single thread there's no bug.
As OpenCL
functions since 1.1
are all thread-safe except for clSetKernelArg()
, this is not because this is undefined behavior as per the spec. When tested against both beignet
and intel-ocl-sdk
, I got the same errors, indicating it doesn't come from the particular implementation. It is thus highly probable the error comes from ocl
.
Tested both on
stable
(rustc 1.37.0 (eae3437df 2019-08-13)
)nightly
(rustc 1.39.0-nightly (96d07e0ac 2019-09-15)
)
Error & debugging
On SIGABRT
these are the messages printed, in decreasing frequency of occurrence:
corrupted size vs. prev_size
(by a wide margin the most common)double free or corruption (!prev)
double free or corruption (out)
double free or corruption (fasttop)
- I once got
clang (LLVM option parsing): for the -memdep-block-scan-limit option: may only occur zero or one times!
but I'm not sure it's linked
On SIGSEGV
no debug messages are printed. Rarely (one in 20 tries I would say), the sample program doesn't error out.
As the error comes from the memory side, *debugging with MALLOC_CHECK_=1
(or 2
) restricts the errors to either SIGSEGV
or SIGABRT
with free(): invalid pointer
as message.
When debugging with GDB, the error always occurred when in ocl-core::retain_context
or ocl-core::retain_mem_object
.
Reproduction
I have been able to reduce the reproduction to the following code:
extern crate ocl;
pub fn new() {
ocl::Context::builder().build();
// Same thing occur with the following line as well (tested with working kernels)
// ocl::ProQue::builder().src(KERNEL_SRC).build();
}
#[cfg(test)]
mod tests {
use super::new;
#[test]
fn test1() {
new();
}
#[test]
fn test2() {
new();
}
}
Run with cargo test -- --test-threads=2
to trigger the error, and cargo test -- --test-threads=1
to see that it isn't triggered in single-threaded situations.
When testing for the error, I didn't correctly check that the error didn't come from the implementation (I didn't correctly switch to intel-ocl-sdk
when I though I did). After trying again I didn't have the error on intel-ocl-sdk
so it seems that it is a beignet
bug, I will be reporting it there. Closing the issue.