iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[CPU][DT] ExpandVector hoists arith op outside of generic op

dcaballe opened this issue · comments

This issue leads to a compilation error when DT is enabled:

Repro:

iree-opt test.mlir --iree-global-opt-expand-vectors

func.func @main(%157 : tensor<256128xi1>, %49 : tensor<256128x1536xf32>) -> tensor<1536xf32> {
  %c0 = arith.constant 0.0 : f32
  %c22 = arith.constant dense<22.0> : tensor<1536xf32>
  %158 = tensor.empty() : tensor<256128xf32>
  %159 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%157 : tensor<256128xi1>) outs(%158 : tensor<256128xf32>) {
  ^bb0(%in: i1, %out: f32):
    %1923 = arith.uitofp %in : i1 to f32
    linalg.yield %1923 : f32
  } -> tensor<256128xf32>
  %160 = tensor.empty() : tensor<1536xf32>
  %161 = linalg.fill ins(%c0 : f32) outs(%160 : tensor<1536xf32>) -> tensor<1536xf32>
  %162 = linalg.vecmat ins(%159, %49 : tensor<256128xf32>, tensor<256128x1536xf32>) outs(%161 : tensor<1536xf32>) -> tensor<1536xf32>
  %163 = tensor.empty() : tensor<1x1x1536xf32>
  %164 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%162, %c22 : tensor<1536xf32>, tensor<1536xf32>) outs(%160 : tensor<1536xf32>) {
  ^bb0(%in: f32, %in_794: f32, %out: f32):
    %1923 = arith.mulf %in, %in_794 : f32
    linalg.yield %1923 : f32
  } -> tensor<1536xf32>
  return %164 : tensor<1536xf32>
}

Output:

#map = affine_map<(d0) -> (d0)>
module {
  func.func @main(%arg0: tensor<256128xi1>, %arg1: tensor<256128x1536xf32>) -> tensor<1536xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %cst_0 = arith.constant dense<2.200000e+01> : tensor<1536xf32>
    %0 = tensor.empty() : tensor<1536xf32>
    %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<1536xf32>) -> tensor<1536xf32>
    %expanded = tensor.expand_shape %arg0 [[0, 1]] : tensor<256128xi1> into tensor<1x256128xi1>
    %2 = arith.uitofp %expanded : tensor<1x256128xi1> to tensor<1x256128xf32>
    %expanded_1 = tensor.expand_shape %1 [[0, 1]] : tensor<1536xf32> into tensor<1x1536xf32>
    %3 = linalg.matmul ins(%2, %arg1 : tensor<1x256128xf32>, tensor<256128x1536xf32>) outs(%expanded_1 : tensor<1x1536xf32>) -> tensor<1x1536xf32>
    %collapsed = tensor.collapse_shape %3 [[0, 1]] : tensor<1x1536xf32> into tensor<1536xf32>
    %4 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%collapsed, %cst_0 : tensor<1536xf32>, tensor<1536xf32>) outs(%0 : tensor<1536xf32>) {
    ^bb0(%in: f32, %in_2: f32, %out: f32):
      %5 = arith.mulf %in, %in_2 : f32
      linalg.yield %5 : f32
    } -> tensor<1536xf32>
    return %4 : tensor<1536xf32>
  }
}

See how arith.uitofp is no longer within a generic op after running this pass, which is incorrect.

Looking into this, seems really odd... any idea if this IR was compiling on any version of ExpandVectors?

I'm at:

commit f66f28f61057008b0f7f0d9e0cea921b6e470803 (HEAD -> main, origin/main, origin/HEAD)
Author: Stella Laurenzo <stellaraccident@gmail.com>
Date:   Thu Nov 9 20:06:32 2023 -0800

@KoolJBlack mentioned that it was compiling for him at (one day earlier):

commit bd603723299a81498327610ccc4444ced1db1662 (HEAD -> main, upstream/main)
Author: Max191 <44243577+Max191@users.noreply.github.com>
Date:   Wed Nov 8 14:47:16 2023 -0500

It is added by @Max191 #15372

Should be a simple fix, we have this pass to convert elementwise ops into linalg ops: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Linalg/Transforms/ElementwiseToLinalg.cpp

@NatashaKnk @Max191 could you coordinate on this issue?

Ok, so @Max191 is taking a look? Just have in mind that this ExtendVector might go away once @NatashaKnk extends the pad op to support 1D inputs and we can preserve linalg.vecmat/linalg.matvec until encoding materialization.

I think he is probably off for today. I will raise it to him on Monday. Is @NatashaKnk able to fix it?

Ah, I see what's happening. Yeah, I can pick it up

Ah, I see what's happening. Yeah, I can pick it up

Nice, thank you so much!

Sorry for the confusion here, but now that I'm looking at it a bit further, I might be missing the point of 15372. Isn't adding a generic op here undoing what it's trying to do to begin with?

Sorry for the confusion here, but now that I'm looking at it a bit further, I might be missing the point of 15372. Isn't adding a generic op here undoing what it's trying to do to begin with?

Sorry, I missed this thread until now. If you haven't already fixed this I can put out a quick patch tomorrow. Thank you for offering to pick it up though!

Creating the generic op here is actually ok, because the main point here is to separate out the casting of the operands from the matmul. I did this so that we could represent matmuls with unsigned inputs and eventually lower them to microkernels that take advantage of the signed times unsigned operands. Generic or not, as long as it is a separate op, this will work fine. In fact, setEncoding will eventually turn those ops into generics anyway. We should keep these casting ops as generics the whole way though so we don't get failing cases when it misses the setEncoding like this.

There are a couple places where I think I created the ordinary castOpInterface ops on tensors, so I can put out a patch to use generics everywhere instead if you haven't already done this.

Gotcha, that makes sense. If you think it's best to do it everywhere at once, please feel free to do it. Thanks!