[CPU][DT] ExpandVector hoists arith op outside of generic op
dcaballe opened this issue · comments
This issue leads to a compilation error when DT is enabled:
Repro:
iree-opt test.mlir --iree-global-opt-expand-vectors
func.func @main(%157 : tensor<256128xi1>, %49 : tensor<256128x1536xf32>) -> tensor<1536xf32> {
%c0 = arith.constant 0.0 : f32
%c22 = arith.constant dense<22.0> : tensor<1536xf32>
%158 = tensor.empty() : tensor<256128xf32>
%159 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%157 : tensor<256128xi1>) outs(%158 : tensor<256128xf32>) {
^bb0(%in: i1, %out: f32):
%1923 = arith.uitofp %in : i1 to f32
linalg.yield %1923 : f32
} -> tensor<256128xf32>
%160 = tensor.empty() : tensor<1536xf32>
%161 = linalg.fill ins(%c0 : f32) outs(%160 : tensor<1536xf32>) -> tensor<1536xf32>
%162 = linalg.vecmat ins(%159, %49 : tensor<256128xf32>, tensor<256128x1536xf32>) outs(%161 : tensor<1536xf32>) -> tensor<1536xf32>
%163 = tensor.empty() : tensor<1x1x1536xf32>
%164 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%162, %c22 : tensor<1536xf32>, tensor<1536xf32>) outs(%160 : tensor<1536xf32>) {
^bb0(%in: f32, %in_794: f32, %out: f32):
%1923 = arith.mulf %in, %in_794 : f32
linalg.yield %1923 : f32
} -> tensor<1536xf32>
return %164 : tensor<1536xf32>
}
Output:
#map = affine_map<(d0) -> (d0)>
module {
func.func @main(%arg0: tensor<256128xi1>, %arg1: tensor<256128x1536xf32>) -> tensor<1536xf32> {
%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant dense<2.200000e+01> : tensor<1536xf32>
%0 = tensor.empty() : tensor<1536xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<1536xf32>) -> tensor<1536xf32>
%expanded = tensor.expand_shape %arg0 [[0, 1]] : tensor<256128xi1> into tensor<1x256128xi1>
%2 = arith.uitofp %expanded : tensor<1x256128xi1> to tensor<1x256128xf32>
%expanded_1 = tensor.expand_shape %1 [[0, 1]] : tensor<1536xf32> into tensor<1x1536xf32>
%3 = linalg.matmul ins(%2, %arg1 : tensor<1x256128xf32>, tensor<256128x1536xf32>) outs(%expanded_1 : tensor<1x1536xf32>) -> tensor<1x1536xf32>
%collapsed = tensor.collapse_shape %3 [[0, 1]] : tensor<1x1536xf32> into tensor<1536xf32>
%4 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%collapsed, %cst_0 : tensor<1536xf32>, tensor<1536xf32>) outs(%0 : tensor<1536xf32>) {
^bb0(%in: f32, %in_2: f32, %out: f32):
%5 = arith.mulf %in, %in_2 : f32
linalg.yield %5 : f32
} -> tensor<1536xf32>
return %4 : tensor<1536xf32>
}
}
See how arith.uitofp
is no longer within a generic op after running this pass, which is incorrect.
Looking into this, seems really odd... any idea if this IR was compiling on any version of ExpandVectors?
I'm at:
commit f66f28f61057008b0f7f0d9e0cea921b6e470803 (HEAD -> main, origin/main, origin/HEAD)
Author: Stella Laurenzo <stellaraccident@gmail.com>
Date: Thu Nov 9 20:06:32 2023 -0800
@KoolJBlack mentioned that it was compiling for him at (one day earlier):
commit bd603723299a81498327610ccc4444ced1db1662 (HEAD -> main, upstream/main)
Author: Max191 <44243577+Max191@users.noreply.github.com>
Date: Wed Nov 8 14:47:16 2023 -0500
Should be a simple fix, we have this pass to convert elementwise ops into linalg ops: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Linalg/Transforms/ElementwiseToLinalg.cpp
Or we should just teach the pass to generate a generic op: https://github.com/openxla/iree/blob/2f9a1e173237b0948ef4f2bd3e03c85856d1be27/compiler/src/iree/compiler/GlobalOptimization/ExpandVectors.cpp#L109-L113
Or we should just teach the pass to generate a generic op:
I think this is a better fix.
@NatashaKnk @Max191 could you coordinate on this issue?
Ok, so @Max191 is taking a look? Just have in mind that this ExtendVector might go away once @NatashaKnk extends the pad op to support 1D inputs and we can preserve linalg.vecmat/linalg.matvec until encoding materialization.
I think he is probably off for today. I will raise it to him on Monday. Is @NatashaKnk able to fix it?
Ah, I see what's happening. Yeah, I can pick it up
Ah, I see what's happening. Yeah, I can pick it up
Nice, thank you so much!
Sorry for the confusion here, but now that I'm looking at it a bit further, I might be missing the point of 15372. Isn't adding a generic op here undoing what it's trying to do to begin with?
Sorry for the confusion here, but now that I'm looking at it a bit further, I might be missing the point of 15372. Isn't adding a generic op here undoing what it's trying to do to begin with?
Sorry, I missed this thread until now. If you haven't already fixed this I can put out a quick patch tomorrow. Thank you for offering to pick it up though!
Creating the generic op here is actually ok, because the main point here is to separate out the casting of the operands from the matmul. I did this so that we could represent matmuls with unsigned inputs and eventually lower them to microkernels that take advantage of the signed times unsigned operands. Generic or not, as long as it is a separate op, this will work fine. In fact, setEncoding will eventually turn those ops into generics anyway. We should keep these casting ops as generics the whole way though so we don't get failing cases when it misses the setEncoding like this.
There are a couple places where I think I created the ordinary castOpInterface ops on tensors, so I can put out a patch to use generics everywhere instead if you haven't already done this.
Gotcha, that makes sense. If you think it's best to do it everywhere at once, please feel free to do it. Thanks!