cucapra / diospyros

Search-based compiler for high-performance DSP programming

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect translation of float to i32 conversion

JonathanDLTran opened this issue · comments

Some LLVM passes generate IR code that stores data to an address in a float array, but the data and the address will be cast from float/float* to int and int*, respectively. In some cases, for the Diospyros pass, when Egg terms are translated back to LLVM IR, this effect can cause NaN values to appear in the float array, where NaN values should not exist.

A test case exposing this issue is shown below, using some code reduced from the original naive_fixed_qr_decomp function.

void naive_fixed_qr_decomp(float Q[SIZE], float x[SIZE], float q_t[SIZE]) {
    float alpha = -sgn(x[0]);

    for (int i = 0; i < SIZE; i++) {
        q_t[i] = alpha;
    }
    for (int i = 0; i < SIZE; i++) {
        Q[i] = q_t[i];
    }
}

The output in opt.ll, before the diospyros pass is run is:

; Function Attrs: noinline nounwind ssp uwtable
define void @naive_fixed_qr_decomp(float* %0, float* %1, float* %2) #1 {
.preheader:
  %3 = load float, float* %1, align 4
  %4 = fneg float %3
  store float %4, float* %2, align 4
  %5 = getelementptr inbounds float, float* %2, i64 1
  store float %4, float* %5, align 4
  %6 = getelementptr inbounds float, float* %2, i64 2
  store float %4, float* %6, align 4
  %7 = getelementptr inbounds float, float* %2, i64 3
  store float %4, float* %7, align 4
  %8 = bitcast float %4 to i32
  %9 = bitcast float* %0 to i32*
  store i32 %8, i32* %9, align 4
  %10 = bitcast float* %5 to i32*
  %11 = load i32, i32* %10, align 4
  %12 = getelementptr inbounds float, float* %0, i64 1
  %13 = bitcast float* %12 to i32*
  store i32 %11, i32* %13, align 4
  %14 = bitcast float* %6 to i32*
  %15 = load i32, i32* %14, align 4
  %16 = getelementptr inbounds float, float* %0, i64 2
  %17 = bitcast float* %16 to i32*
  store i32 %15, i32* %17, align 4
  %18 = bitcast float* %7 to i32*
  %19 = load i32, i32* %18, align 4
  %20 = getelementptr inbounds float, float* %0, i64 3
  %21 = bitcast float* %20 to i32*
  store i32 %19, i32* %21, align 4
  ret void
}

Conversions from float to i32 and float* to int* can be seen where loads load in i32 values, and bitcasts change float pointers to i32 pointers.
An excerpt of the problematic output after running the Diospyros pass, and dead code elimination is:

  %29 = extractelement <8 x float> %18, i32 4
  %30 = fptosi float %29 to i32
  %31 = getelementptr float, float* %0, i32 0
  %32 = bitcast float* %31 to i32*
  store float %30, float* %32, align 4

Here, the fptosi is inserted by the Diospyros pass, to get the types correct. The bitcast is from the original LLVM code. The result is that an NaN value is found in the array.

When the LLVM IR is manually changed to be:

  %29 = extractelement <8 x float> %18, i32 4
  %30 = fptosi float %29 to i32
  %31 = getelementptr float, float* %0, i32 0
  %32 = bitcast float* %31 to i32*
  store float %29, float* %31, align 4

making instructions %30 and %32 dead, the correct value in the array is obtained.

I am thinking of fixing this issue by forcing all loads and stores from arrays to have data and address of float/float * type. Before the Diospyros pass, any load or store would be changed so that it is a load/store for a float, with a float pointer. To guide the changes, at every load/store, the arguments of the instruction would be checked to see if it is a float type. This approach might require extra restrictions to be placed on what kind of C code can be run through the pass in the first place. For instance, only float arrays/pointers would be allowed. If it is not possible to determine if the arguments are a float, then the code would not be run through the pass.

Wow! Weird that LLVM does this in the first place. It's not clear what Clang gains from doing this conversion when floats go to/from memory.

Requiring stuff to be floats, and then using bitcasts instead of fptosi, seems like a perfectly reasonable strategy. You're right that we would need to enforce float-typed inputs & outputs from the "target region."

One other thing to maybe consider is whether there is an existing LLVM pass that would clean this up. It seems somewhat unlikely, but one candidate is InstCombine, which is a kind of "kitchen sink" of minor/local simplifications like this.

I can try the InstCombine pass, and also look at other LLVM passes as well. The other thing I will do concurrently is to change the pointer type back to float pointers using bitcasts, and make sure to check all the input and output arrays/pointers are of float type to avoid generating problems with the bitcasts.

Fixed in 76180f2 and previous.