[CPU][DT] ExpandVector hoists arith op outside of generic op

Question

[CPU][DT] ExpandVector hoists arith op outside of generic op

dcaballe opened this issue 7 months ago · comments

This issue leads to a compilation error when DT is enabled:

Repro:

iree-opt test.mlir --iree-global-opt-expand-vectors

func.func @main(%157 : tensor<256128xi1>, %49 : tensor<256128x1536xf32>) -> tensor<1536xf32> {
  %c0 = arith.constant 0.0 : f32
  %c22 = arith.constant dense<22.0> : tensor<1536xf32>
  %158 = tensor.empty() : tensor<256128xf32>
  %159 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%157 : tensor<256128xi1>) outs(%158 : tensor<256128xf32>) {
  ^bb0(%in: i1, %out: f32):
    %1923 = arith.uitofp %in : i1 to f32
    linalg.yield %1923 : f32
  } -> tensor<256128xf32>
  %160 = tensor.empty() : tensor<1536xf32>
  %161 = linalg.fill ins(%c0 : f32) outs(%160 : tensor<1536xf32>) -> tensor<1536xf32>
  %162 = linalg.vecmat ins(%159, %49 : tensor<256128xf32>, tensor<256128x1536xf32>) outs(%161 : tensor<1536xf32>) -> tensor<1536xf32>
  %163 = tensor.empty() : tensor<1x1x1536xf32>
  %164 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%162, %c22 : tensor<1536xf32>, tensor<1536xf32>) outs(%160 : tensor<1536xf32>) {
  ^bb0(%in: f32, %in_794: f32, %out: f32):
    %1923 = arith.mulf %in, %in_794 : f32
    linalg.yield %1923 : f32
  } -> tensor<1536xf32>
  return %164 : tensor<1536xf32>
}

Output:

#map = affine_map<(d0) -> (d0)>
module {
  func.func @main(%arg0: tensor<256128xi1>, %arg1: tensor<256128x1536xf32>) -> tensor<1536xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %cst_0 = arith.constant dense<2.200000e+01> : tensor<1536xf32>
    %0 = tensor.empty() : tensor<1536xf32>
    %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<1536xf32>) -> tensor<1536xf32>
    %expanded = tensor.expand_shape %arg0 [[0, 1]] : tensor<256128xi1> into tensor<1x256128xi1>
    %2 = arith.uitofp %expanded : tensor<1x256128xi1> to tensor<1x256128xf32>
    %expanded_1 = tensor.expand_shape %1 [[0, 1]] : tensor<1536xf32> into tensor<1x1536xf32>
    %3 = linalg.matmul ins(%2, %arg1 : tensor<1x256128xf32>, tensor<256128x1536xf32>) outs(%expanded_1 : tensor<1x1536xf32>) -> tensor<1x1536xf32>
    %collapsed = tensor.collapse_shape %3 [[0, 1]] : tensor<1x1536xf32> into tensor<1536xf32>
    %4 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%collapsed, %cst_0 : tensor<1536xf32>, tensor<1536xf32>) outs(%0 : tensor<1536xf32>) {
    ^bb0(%in: f32, %in_2: f32, %out: f32):
      %5 = arith.mulf %in, %in_2 : f32
      linalg.yield %5 : f32
    } -> tensor<1536xf32>
    return %4 : tensor<1536xf32>
  }
}

See how arith.uitofp is no longer within a generic op after running this pass, which is incorrect.

NatashaKnk · Answer 1 · Sat Nov 11 2023 06:03:46 GMT+0800 (China Standard Time)

Looking into this, seems really odd... any idea if this IR was compiling on any version of ExpandVectors?

Diego Caballero · Answer 2 · Sat Nov 11 2023 06:08:22 GMT+0800 (China Standard Time)

I'm at:

commit f66f28f61057008b0f7f0d9e0cea921b6e470803 (HEAD -> main, origin/main, origin/HEAD)
Author: Stella Laurenzo <stellaraccident@gmail.com>
Date:   Thu Nov 9 20:06:32 2023 -0800

@KoolJBlack mentioned that it was compiling for him at (one day earlier):

commit bd603723299a81498327610ccc4444ced1db1662 (HEAD -> main, upstream/main)
Author: Max191 <44243577+Max191@users.noreply.github.com>
Date:   Wed Nov 8 14:47:16 2023 -0500

Han-Chung Wang · Answer 3 · Sat Nov 11 2023 06:08:37 GMT+0800 (China Standard Time)

It is added by @Max191 #15372

Should be a simple fix, we have this pass to convert elementwise ops into linalg ops: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Linalg/Transforms/ElementwiseToLinalg.cpp

Han-Chung Wang · Answer 4 · Sat Nov 11 2023 06:10:10 GMT+0800 (China Standard Time)

Or we should just teach the pass to generate a generic op: https://github.com/openxla/iree/blob/2f9a1e173237b0948ef4f2bd3e03c85856d1be27/compiler/src/iree/compiler/GlobalOptimization/ExpandVectors.cpp#L109-L113

Han-Chung Wang · Answer 5 · Sat Nov 11 2023 06:10:41 GMT+0800 (China Standard Time)

Or we should just teach the pass to generate a generic op:

https://github.com/openxla/iree/blob/2f9a1e173237b0948ef4f2bd3e03c85856d1be27/compiler/src/iree/compiler/GlobalOptimization/ExpandVectors.cpp#L109-L113

I think this is a better fix.

Han-Chung Wang · Answer 6 · Sat Nov 11 2023 06:11:33 GMT+0800 (China Standard Time)

@NatashaKnk @Max191 could you coordinate on this issue?

Diego Caballero · Answer 7 · Sat Nov 11 2023 06:12:57 GMT+0800 (China Standard Time)

Ok, so @Max191 is taking a look? Just have in mind that this ExtendVector might go away once @NatashaKnk extends the pad op to support 1D inputs and we can preserve linalg.vecmat/linalg.matvec until encoding materialization.

Han-Chung Wang · Answer 8 · Sat Nov 11 2023 06:19:20 GMT+0800 (China Standard Time)

I think he is probably off for today. I will raise it to him on Monday. Is @NatashaKnk able to fix it?

NatashaKnk · Answer 9 · Sat Nov 11 2023 06:43:33 GMT+0800 (China Standard Time)

Ah, I see what's happening. Yeah, I can pick it up

Han-Chung Wang · Answer 10 · Sat Nov 11 2023 07:00:20 GMT+0800 (China Standard Time)

Ah, I see what's happening. Yeah, I can pick it up

Nice, thank you so much!

NatashaKnk · Answer 11 · Mon Nov 13 2023 08:57:42 GMT+0800 (China Standard Time)

Sorry for the confusion here, but now that I'm looking at it a bit further, I might be missing the point of 15372. Isn't adding a generic op here undoing what it's trying to do to begin with?

Max191 · Answer 12 · Mon Nov 13 2023 11:06:44 GMT+0800 (China Standard Time)

Sorry for the confusion here, but now that I'm looking at it a bit further, I might be missing the point of 15372. Isn't adding a generic op here undoing what it's trying to do to begin with?

Sorry, I missed this thread until now. If you haven't already fixed this I can put out a quick patch tomorrow. Thank you for offering to pick it up though!

Creating the generic op here is actually ok, because the main point here is to separate out the casting of the operands from the matmul. I did this so that we could represent matmuls with unsigned inputs and eventually lower them to microkernels that take advantage of the signed times unsigned operands. Generic or not, as long as it is a separate op, this will work fine. In fact, setEncoding will eventually turn those ops into generics anyway. We should keep these casting ops as generics the whole way though so we don't get failing cases when it misses the setEncoding like this.

There are a couple places where I think I created the ordinary castOpInterface ops on tensors, so I can put out a patch to use generics everywhere instead if you haven't already done this.

NatashaKnk · Answer 13 · Tue Nov 14 2023 02:33:20 GMT+0800 (China Standard Time)

Gotcha, that makes sense. If you think it's best to do it everywhere at once, please feel free to do it. Thanks!