assertion failure (cast<>) in LLVMCPULowerExecutableTargetPass

Question

assertion failure (cast<>) in LLVMCPULowerExecutableTargetPass

silvasean opened this issue 2 years ago · comments

Sean Silva commented 2 years ago

Describe the bug

Full error log https://gist.github.com/silvasean/e96175f74a8c7833299b8d35e33ebfd2

To Reproduce

iree-compile -hal-target-backends=dylib core-input.mlir

#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
module attributes {torch.debug_module_name = "AvgPool2dIntModule"} {
  func @forward(%arg0: tensor<?x?x?x?xi64>) -> tensor<?x?x?x?xi64> {
    %c48_i64 = arith.constant 48 : i64
    %c1_i64 = arith.constant 1 : i64
    %c2_i64 = arith.constant 2 : i64
    %c3 = arith.constant 3 : index
    %c2 = arith.constant 2 : index
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %c0_i64 = arith.constant 0 : i64
    %0 = tensor.pad %arg0 low[0, 0, 3, 4] high[0, 0, 3, 4] {
    ^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):
      tensor.yield %c0_i64 : i64
    } : tensor<?x?x?x?xi64> to tensor<?x?x?x?xi64>
    %1 = tensor.dim %arg0, %c0 : tensor<?x?x?x?xi64>
    %2 = tensor.dim %arg0, %c1 : tensor<?x?x?x?xi64>
    %3 = tensor.dim %arg0, %c2 : tensor<?x?x?x?xi64>
    %4 = tensor.dim %arg0, %c3 : tensor<?x?x?x?xi64>
    %5 = arith.index_cast %3 : index to i64
    %6 = arith.floordivsi %5, %c2_i64 : i64
    %7 = arith.addi %6, %c1_i64 : i64
    %8 = arith.index_cast %7 : i64 to index
    %9 = arith.index_cast %4 : index to i64
    %10 = arith.floordivsi %9, %c2_i64 : i64
    %11 = arith.addi %10, %c1_i64 : i64
    %12 = arith.index_cast %11 : i64 to index
    %13 = linalg.init_tensor [%1, %2, %8, %12] : tensor<?x?x?x?xi64>
    %14 = linalg.fill ins(%c0_i64 : i64) outs(%13 : tensor<?x?x?x?xi64>) -> tensor<?x?x?x?xi64>
    %15 = linalg.init_tensor [6, 8] : tensor<6x8xi64>
    %16 = linalg.pooling_nchw_sum {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>} ins(%0, %15 : tensor<?x?x?x?xi64>, tensor<6x8xi64>) outs(%14 : tensor<?x?x?x?xi64>) -> tensor<?x?x?x?xi64>
    %17 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%16 : tensor<?x?x?x?xi64>) outs(%13 : tensor<?x?x?x?xi64>) {
    ^bb0(%arg1: i64, %arg2: i64):
      %18 = arith.divsi %arg1, %c48_i64 : i64
      linalg.yield %18 : i64
    } -> tensor<?x?x?x?xi64>
    return %17 : tensor<?x?x?x?xi64>
  }
}

Han-Chung Wang · Answer 1 · Tue May 03 2022 19:33:38 GMT+0800 (China Standard Time)

It looks like some operations can't be casted to PartitionableLoopsInterface. I'll take a look at it

Jerry Wu · Answer 2 · Thu May 05 2022 08:35:56 GMT+0800 (China Standard Time)

Looks like the list of PartitionableLoopsInterface registry doesn't include linalg.pooling_nchw_sum. #9055 should fix it.

Sean Silva · Answer 3 · Thu May 05 2022 14:30:09 GMT+0800 (China Standard Time)

Hi @pzread -- #9055 doesn't seem to fix the core issue -- there is an unbounded set of linalg ops, and we cannot have the pass asserting every time a new one comes along. Can we change the code to emit a proper error instead of an assertion failure?

Han-Chung Wang · Answer 4 · Thu May 05 2022 14:35:25 GMT+0800 (China Standard Time)

// This is copy-pasted from LinalgStructuredOps.cpp.inc. In theory you could
// just include that generated file here, but that cause errors with bazel.
// The required generated header is not exposed correctly.
// Copy paste is fine for now.

It looks like we can include LinalgStructuredOps.cpp.inc instead. However, the comment states out that there are issues in bazel BUILD. Maybe we can revisit if the solution can be applied?

Sean Silva · Answer 5 · Thu May 05 2022 15:41:06 GMT+0800 (China Standard Time)

I mean, can we replace the cast with dyn_cast and get an error message about what needs to be done?

I think we can just replace cast with dyn_cast and return failure() if the cast fails.
https://github.com/google/iree/blob/7f9719876f77d50c3e56ba093b13483def97e705/compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.cpp#L885

Han-Chung Wang · Answer 6 · Thu May 05 2022 16:04:06 GMT+0800 (China Standard Time)

can we replace the cast with dyn_cast and get an error message about what needs to be done

I think that would be good if including LinalgStructuredOps.cpp.inc does not work.

If including LinalgStructuredOps.cpp.inc works, it's guaranteed that all the Linalg ops can be casted to PartitionableLoopsInterface. Then we don't need the check even more Linalg ops are added in the future.

Sean Silva · Answer 7 · Thu May 05 2022 17:23:27 GMT+0800 (China Standard Time)

I think that we want to be defensive here -- IREE and its frontends might be built at slightly different LLVM versions -- there is always a chance that something slips in, even if we are building from LinalgStructuredOps.cpp.inc.

Han-Chung Wang · Answer 8 · Thu May 05 2022 19:34:59 GMT+0800 (China Standard Time)

I see your point. I think all the root ops should be able to cast to PartitionableLoopsInterface. That's how we define a dispatch in IREE. In this context, all the backends have to query partitionable loops by casting the root op to PartitionableLoopsInterface. We might want to raise the error earlier or have a common check before going to each backend.

My concern now is a layering issue. This should not only be checked in CPU backend, but also have to be checked for all other backends. @MaheshRavishankar can we assume that all the computes ops can be casted into PartionableLoopInterface? If so, maybe we can add the check here:

https://github.com/google/iree/blob/7f9719876f77d50c3e56ba093b13483def97e705/compiler/src/iree/compiler/Codegen/Utils/Utils.cpp#L519-L531

This is the entry point that all the backends get compute ops and try to set configurations.

Jerry Wu · Answer 9 · Thu May 05 2022 20:31:58 GMT+0800 (China Standard Time)

I'll take a look to see if we can import LinalgStructuredOps.cpp.inc in IREE first or try to have a test to make sure we can capture this when compiling IREE.

I think the best case is we can do the check when compiling IREE and in the tests as the runtime check in IREE might be always missing somewhere.

I think that we want to be defensive here -- IREE and its frontends might be built at slightly different LLVM versions -- there is always a chance that something slips in, even if we are building from LinalgStructuredOps.cpp.inc.

@silvasean I'm not sure if I understand this correctly. I assume if the frontend is built at a different LLVM version and produces some new linalg ops, the LLVM parser in IREE shouldn't accept the IR because there are unrecognized ops. In this case, we will see parsing failures when the frontend calls IREE instead of allowing unknown ops to slip in the compilation pipeline.

Sean Silva · Answer 10 · Thu May 05 2022 20:56:47 GMT+0800 (China Standard Time)

@silvasean I'm not sure if I understand this correctly. I assume if the frontend is built at a different LLVM version and produces some new linalg ops, the LLVM parser in IREE shouldn't accept the IR because there are unrecognized ops. In this case, we will see parsing failures when the frontend calls IREE instead of allowing unknown ops to slip in the compilation pipeline.

Good point! We probably don't need to be defensive about that.

Jerry Wu · Answer 11 · Wed May 18 2022 05:44:53 GMT+0800 (China Standard Time)

With #9062, we no longer have a copy of op list in IREE.