[BUG] wierd core dump for uninitialized array (no data, undefined size)

Question

[BUG] wierd core dump for uninitialized array (no data, undefined size)

liuy opened this issue a year ago · comments

I'm tring with arrayfire latest stable 3.8.3 from offical .sh installer on ubuntu 22.04 lts. The full class and code snippet is defined as follows:
I tried all the backends, all worked the same. What am I doing wrong for dealing with uninitialized array as a private data of a class? it looks a bug to me, but I can work around the coredump by simply calling af_print() beforehand.

=== .h file ===

class Tensor {
    private:
        array data; // <---- here I defined a unitilized array
        bool data_computed = false;
        Tensor *lhs;
        Tensor *rhs;
        forward_fn_t forward_fn;
        backward_fn_t backward_fn;
    public:
        Tensor(array &a)
        {
            data = a;
            data_computed = true;
        }
        Tensor(Tensor &a, Tensor &b, forward_fn_t ffn, backward_fn_t bfn)
        {
            lhs = &a;
            rhs = &b;
            forward_fn = ffn;
            backward_fn = bfn;
        }
        void forward(void);
        Tensor matmul(Tensor &t);
        Tensor operator+(Tensor &t);
        Tensor operator-(Tensor &t);
        inline void print()
        {
            af_print(data);
        }
};

== .cpp file ==

void Tensor::forward(void)
{
    if (data_computed)
        return;
    if (!lhs->data_computed)
        lhs->forward();
    if (!rhs->data_computed)
        rhs->forward();
    data = forward_fn(lhs->data, rhs->data);
    data_computed = true;
}

static array add(array &a, array &b)
{
    return a + b;
}

static array sub(array &a, array &b)
{
    return a - b;
}

static array mat_mul(array &a, array &b)
{
    return af::matmul(a, b);
}

Tensor Tensor::operator+(Tensor &t)
{
    return Tensor(*this, t, add, NULL);
}

Tensor Tensor::operator-(Tensor &t)
{
    return Tensor(*this, t, sub, NULL);
}

=== main.cpp======

int main(int argc, char* argv[])
{
    af::info();
    float i[] = {1.0, 1.0, 1.0, 1.0};
    array A(2, 2, i);
    float j[] = {2.0, 2.0, 2.0, 2.0};
    array B(2, 2, j);
    float k[] = {3.0, 3.0, 3.0, 2.0};
    array C(2, 2, k);
    Tensor c(C);
    Tensor a(A);
    Tensor b(B);
    Tensor d = a + b - c;
    // b.print(); // <----------------This is magic line, without this, I got a core dump
    d.forward();
    d.print();

    return 0;
}

================
without magic line, I got this:
=====output============

ArrayFire v3.8.3 (CPU, 64-bit Linux, build 987d567)
terminate called after throwing an instance of 'AfError'
  what():  Input Array not created on current device
Aborted (core dumped)

=================
with magic line, I can run without any error
=======output======

ArrayFire v3.8.3 (CPU, 64-bit Linux, build 987d567)
[0] Intel: Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHzdata
[2 2 1 1]
    2.0000     2.0000
    2.0000     2.0000

data
[2 2 1 1]
    0.0000     0.0000
    0.0000     1.0000

Umar Arshad · Answer 1 · Sat Jul 01 2023 03:32:05 GMT+0800 (China Standard Time)

Hi @liuy,

I think the problem is that the Tensor returned from the operator+ function is deleted before you take its pointer and store it in the operator- function. This causes a failure when you call the sub function.

There is a separate issue that needs to be addressed in ArrayFire. There is a missing try catch in some of the arithmatic functions that needs to be addressed. Thanks for bringing this to our attention.

Closing the issue but I will be fixing the reason for the segfault. An exception should be caught in the C API before its is terminated.