m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs

Home Page:http://www.ilgpu.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Placing two SpecializedValue<bool> kernel arguments before ArrayView<T> causes illegal memory access

TriceHelix opened this issue · comments

Dear ILGPU Team,

Please analyze and run the following code (ILGPU 1.3.1, Release config, x64 Windows):

using ILGPU;
using ILGPU.Runtime;
using ILGPU.Runtime.Cuda;

const int NUM_VALUES = 555;

// SETUP
Context ctx = Context.Create(b => b.Default().Inlining(InliningMode.Aggressive).Optimize(OptimizationLevel.Release));
Accelerator accelerator = ctx.GetDevice<CudaDevice>(0).CreateAccelerator(ctx);
var workingKernelA = accelerator.LoadAutoGroupedStreamKernel<Index1D, SpecializedValue<bool>, ArrayView<int>>(Kernels.WorkingKernelA);
var workingKernelB = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>, SpecializedValue<bool>, SpecializedValue<bool>>(Kernels.WorkingKernelB);
var buggedKernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, SpecializedValue<bool>, SpecializedValue<bool>, ArrayView<int>>(Kernels.BuggedKernel);
using var buffer = accelerator.Allocate1D<int>(NUM_VALUES);

Console.WriteLine("Press any key to execute the working kernel (A).");
Console.ReadKey();

// WORKING KERNEL (A)
buffer.MemSetToZero();
workingKernelA(NUM_VALUES, SpecializedValue.New(true), buffer.View);
File.WriteAllText("working_valuesA.txt", string.Join('\n', buffer.GetAsArray1D()), System.Text.Encoding.UTF8);
Console.WriteLine("SUCCESS");

Console.WriteLine("Press any key to execute the working kernel (B).");
Console.ReadKey();

// WORKING KERNEL (B)
buffer.MemSetToZero();
workingKernelB(NUM_VALUES, buffer.View, SpecializedValue.New(true), SpecializedValue.New(true));
File.WriteAllText("working_valuesB.txt", string.Join('\n', buffer.GetAsArray1D()), System.Text.Encoding.UTF8);
Console.WriteLine("SUCCESS");

Console.WriteLine("Press any key to execute the bugged kernel.");
Console.ReadKey();

// BUGGED KERNEL
buffer.MemSetToZero();
buggedKernel(NUM_VALUES, SpecializedValue.New(true), SpecializedValue.New(true), buffer.View);
File.WriteAllText("bugged_values.txt", string.Join('\n', buffer.GetAsArray1D()), System.Text.Encoding.UTF8);
Console.WriteLine("SUCCESS");

Console.WriteLine("Press any key to exit.");
Console.ReadKey();


class Kernels
{
    public static void WorkingKernelA(
        Index1D index,
        SpecializedValue<bool> specialized1, // <- works with one SpecializedValue before buffer
        ArrayView<int> values)
    {
        values[index] = index;
    }


    public static void WorkingKernelB(
        Index1D index,
        ArrayView<int> values,
        SpecializedValue<bool> specialized1,
        SpecializedValue<bool> specialized2) // <- works with two SpecializedValues after buffer
    {
        values[index] = index;
    }


    public static void BuggedKernel(
        Index1D index,
        SpecializedValue<bool> specialized1,
        SpecializedValue<bool> specialized2, // <- illegal memory access with two SpecializedValues before buffer
        ArrayView<int> values)
    {
        values[index] = index;
    }
}

I don't know what exactly is happening here, but the third kernel, despite being identical to the first two (except for its arguments), causes an illegal memory access on a CUDA device. This does not happen with the CPU accelerator. (I have not tested OpenCL)

Thank you for your time.

commented

hi @TriceHelix. My initial thoughts are that the .NET garbage collection is destroying the ILGPU context or accelerator. Both of those classes implement IDisposable, and need to be kept alive until all kernels have finished running.

Could you try changing your code to add using for both those variables?

// SETUP
using Context ctx = ...;
using Accelerator accelerator = ...;

Hi @MoFtZ , thanks for the quick answer. I adjusted the code like you suggested:

using ILGPU;
using ILGPU.Runtime;
using ILGPU.Runtime.Cuda;

const int NUM_VALUES = 555;

// SETUP
using Context ctx = Context.Create(b => b.Default().Inlining(InliningMode.Aggressive).Optimize(OptimizationLevel.Release));
using Accelerator accelerator = ctx.GetDevice<CudaDevice>(0).CreateAccelerator(ctx);
var workingKernelA = accelerator.LoadAutoGroupedStreamKernel<Index1D, SpecializedValue<bool>, ArrayView<int>>(Kernels.WorkingKernelA);
var workingKernelB = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>, SpecializedValue<bool>, SpecializedValue<bool>>(Kernels.WorkingKernelB);
var buggedKernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, SpecializedValue<bool>, SpecializedValue<bool>, ArrayView<int>>(Kernels.BuggedKernel);
using var buffer = accelerator.Allocate1D<int>(NUM_VALUES);

Console.WriteLine("Press any key to execute the working kernel (A).");
Console.ReadKey();

// WORKING KERNEL (A)
buffer.MemSetToZero();
workingKernelA(NUM_VALUES, SpecializedValue.New(true), buffer.View);
File.WriteAllText("working_valuesA.txt", string.Join('\n', buffer.GetAsArray1D()), System.Text.Encoding.UTF8);
Console.WriteLine("SUCCESS");

Console.WriteLine("Press any key to execute the bugged kernel.");
Console.ReadKey();

// BUGGED KERNEL
buffer.MemSetToZero();
buggedKernel(NUM_VALUES, SpecializedValue.New(true), SpecializedValue.New(true), buffer.View);
File.WriteAllText("bugged_values.txt", string.Join('\n', buffer.GetAsArray1D()), System.Text.Encoding.UTF8);
Console.WriteLine("SUCCESS");

Console.WriteLine("Press any key to execute the working kernel (B).");
Console.ReadKey();

// WORKING KERNEL (B)
buffer.MemSetToZero();
workingKernelB(NUM_VALUES, buffer.View, SpecializedValue.New(true), SpecializedValue.New(true));
File.WriteAllText("working_valuesB.txt", string.Join('\n', buffer.GetAsArray1D()), System.Text.Encoding.UTF8);
Console.WriteLine("SUCCESS");

Console.WriteLine("Press any key to exit.");
Console.ReadKey();

class Kernels
{
    public static void WorkingKernelA(
        Index1D index,
        SpecializedValue<bool> specialized1, // <- works with one SpecializedValue before buffer
        ArrayView<int> values)
    {
        values[index] = index;
    }


    public static void WorkingKernelB(
        Index1D index,
        ArrayView<int> values,
        SpecializedValue<bool> specialized1,
        SpecializedValue<bool> specialized2) // <- works with two SpecializedValues after buffer
    {
        values[index] = index;
    }


    public static void BuggedKernel(
        Index1D index,
        SpecializedValue<bool> specialized1,
        SpecializedValue<bool> specialized2, // <- illegal memory access with two SpecializedValues before buffer
        ArrayView<int> values)
    {
        values[index] = index;
    }
}

As you can see, I also swapped the order of kernels - still, the result is the same. As soon as the buffer is downloaded back to the CPU, an exception is thrown. This issue has occurred in a larger project of mine, but I managed to boil it down to the code I posted. I suggest you run the code yourself. Also, I'm using top-level statements here but the behaviour of the code is identical when used in any other context, regardless of program runtime.
Of course there could still be something wrong with the code itself but I am confident this is a deeper problem, especially since using the CPUAccelerator instead of CUDA makes the code run without issues.

Thanks again for your help.

commented

@TriceHelix OK, I've reproduced the issue.

ILGPU requires that arguments are blittable. This ensures that the memory representation is consistent. However, bool is not a blittable type. If you changed your usage of bool to byte, the issue will be fixed.

@MoFtZ Thank you, I was not aware of the blittable type constraint.

Here is a blittable drop-in replacement of bool for anyone encountering this issue in the future:

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public readonly struct BlittableBool : IEquatable<BlittableBool>, IComparable<Half>
{
    private readonly byte value;


    // CONSTRUCTOR
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public BlittableBool(bool value)
    {
        this.value = value ? (byte)255 : (byte)0;
    }


    // INTERFACE IMPLEMENTATIONS
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public bool Equals(BlittableBool other)
    {
        return (bool)this == (bool)other;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int CompareTo(Half other)
    {
        return ((bool)this).CompareTo(other);
    }


    // OVERRIDES
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public override bool Equals(object obj)
    {
        return Equals((BlittableBool)obj);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public override int GetHashCode()
    {
        return value;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public override string ToString()
    {
        return ((bool)this).ToString();
    }


    // OPERATORS
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool operator ==(BlittableBool l, BlittableBool r)
    {
        return l.Equals(r);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool operator !=(BlittableBool l, BlittableBool r)
    {
        return !l.Equals(r);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static implicit operator bool(BlittableBool value)
    {
        return value.value > 0;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static implicit operator BlittableBool(bool value)
    {
        return new BlittableBool(value);
    }
}