cogciprocate / ocl

OpenCL for Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Double free bug causes SIGABRT or SIGSEGV in multi-threaded situations

aabizri opened this issue · comments

EDIT

When testing for the error, I didn't correctly check that the error didn't come from the implementation (I didn't correctly switch to intel-ocl-sdk when I though I did). After trying again I didn't have the error on intel-ocl-sdk so it seems that it is a beignet bug, I will be reporting it there. Closing the issue.

Summary

On ocl 0.19, when trying to build a Context (or a ProQue) in two concurrent threads, a SIGABRT double free or SIGSEGV error is triggered. On a single thread there's no bug.

As OpenCL functions since 1.1 are all thread-safe except for clSetKernelArg(), this is not because this is undefined behavior as per the spec. When tested against both beignet and intel-ocl-sdk, I got the same errors, indicating it doesn't come from the particular implementation. It is thus highly probable the error comes from ocl.

Tested both on

  • stable (rustc 1.37.0 (eae3437df 2019-08-13))
  • nightly (rustc 1.39.0-nightly (96d07e0ac 2019-09-15))

Error & debugging

On SIGABRT these are the messages printed, in decreasing frequency of occurrence:

  • corrupted size vs. prev_size (by a wide margin the most common)
  • double free or corruption (!prev)
  • double free or corruption (out)
  • double free or corruption (fasttop)
  • I once got clang (LLVM option parsing): for the -memdep-block-scan-limit option: may only occur zero or one times! but I'm not sure it's linked

On SIGSEGV no debug messages are printed. Rarely (one in 20 tries I would say), the sample program doesn't error out.

As the error comes from the memory side, *debugging with MALLOC_CHECK_=1 (or 2) restricts the errors to either SIGSEGV or SIGABRT with free(): invalid pointer as message.

When debugging with GDB, the error always occurred when in ocl-core::retain_context or ocl-core::retain_mem_object.

Reproduction

I have been able to reduce the reproduction to the following code:

extern crate ocl;

pub fn new() {
    ocl::Context::builder().build();
    // Same thing occur with the following line as well (tested with working kernels)
    // ocl::ProQue::builder().src(KERNEL_SRC).build();
}

#[cfg(test)]
mod tests {
    use super::new;

    #[test]
    fn test1() {
        new();
    }

    #[test]
    fn test2() {
        new();
    }
}

Run with cargo test -- --test-threads=2 to trigger the error, and cargo test -- --test-threads=1 to see that it isn't triggered in single-threaded situations.

When testing for the error, I didn't correctly check that the error didn't come from the implementation (I didn't correctly switch to intel-ocl-sdk when I though I did). After trying again I didn't have the error on intel-ocl-sdk so it seems that it is a beignet bug, I will be reporting it there. Closing the issue.