High compilation time is spent on the Canonicalizer for large models

Question

High compilation time is spent on the Canonicalizer for large models

GeorgeARM opened this issue 2 years ago · comments

Have been exploring IREE compilation time over a range of models of varying size and complexity.
Noticed high compilation time on "heavy" models with large const data.
Extracting some timing information outlined that the Canonicalizer after TosaToLinalgNamed to consume unreasonably large portion of the execution time.

An example timing output on inception v3.

===-------------------------------------------------------------------------===
                         ... Execution time report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 38.0063 seconds

  ----User Time----  ----Wall Time----  ----Name----
                             ... skipped lines ...
   34.6385 ( 54.2%)   34.6385 ( 91.1%)  'func.func' Pipeline
    0.0007 (  0.0%)    0.0007 (  0.0%)    TosaMakeBroadcastable
    0.0003 (  0.0%)    0.0003 (  0.0%)    TosaToArith
    0.0002 (  0.0%)    0.0002 (  0.0%)    TosaToTensor
    0.0007 (  0.0%)    0.0007 (  0.0%)    Canonicalizer
    0.0010 (  0.0%)    0.0010 (  0.0%)    TosaOptionalDecompositions
    0.0279 (  0.0%)    0.0279 (  0.1%)    Canonicalizer
    0.0008 (  0.0%)    0.0008 (  0.0%)    TosaMakeBroadcastable
    0.0114 (  0.0%)    0.0114 (  0.0%)    TosaToLinalgNamed
   34.5102 ( 54.0%)   34.5102 ( 90.8%)    Canonicalizer
                             ... skipped lines ...

Printing the IR before and after highlights injection of Transpose operations on the constant weights of Conv2d to bring the TOSA weights layout of FHWC to be LinAlg compatible HWCF. Something that can be noted here as well.

e.g.

%cst_2 = arith.constant dense<[1, 2, 3, 0]> : tensor<4xi64>
%2 = "tosa.transpose"(%cst, %cst_2) : (tensor<64x3x3x32xf32>, tensor<4xi64>) -> tensor<3x3x32x64xf32>

Profiling with callgrind seems to reveal the issue being in the ConstantTransposeOptimization here which is marked as a canonicalisation pattern.

Overall, not sure if something like this should be part of the Canonicalizer in the first place for a variety of reasons.
Suppose this issue needs to be migrated/moved to LLVM repo itself ?!

Steps to reproduce

Download https://tfhub.dev/tensorflow/lite-model/inception_v3/1/default/1 for example
Build IREE like:

cmake -G Ninja .. \
	-DCMAKE_INSTALL_PREFIX=./install \
	-DCMAKE_BUILD_TYPE=Release \
	-DIREE_ENABLE_ASSERTIONS=ON \
	-DIREE_BUILD_COMPILER=ON \
	-DIREE_BUILD_TESTS=OFF \
	-DIREE_BUILD_BENCHMARKS=OFF \
	-DIREE_BUILD_SAMPLES=OFF
cmake --build . --target install -- -k 0

Convert model to MLIR:

iree-import-tflite inception_v3_1_default_1.tflite -o inception_v3.mlir

Compile model and extract timings

./iree-translate --iree-mlir-to-vm-bytecode-module --iree-input-type=tosa --iree-hal-target-backends=vulkan-spirv --iree-vulkan-target-triple=valhall-unknown-android11 inception_v3.mlir -o inception_v3.mali-target.vmfb --mlir-timing

Lei Zhang · Answer 1 · Wed Apr 13 2022 04:00:40 GMT+0800 (China Standard Time)

Oh, my bad. I added that pattern previously with a naive implementation. Later I introduced a similar pattern in Linalg and improved it with better implementation. But never gotten back to improve the TOSA one. I guess the pattern in TOSA can be deleted now given at Linalg level we can fold it. Or update it like the way it is written in Linalg to improve it.

Georgios Pinitas · Answer 2 · Wed Apr 13 2022 19:06:34 GMT+0800 (China Standard Time)

Thanks for your prompt response @antiagainst.
Yes seems sensible to either completely remove or move outside the canonicalize, rework as per how linalg is written and put it on its own pass.

Will have a look and upstream a fix. Should I leave this open until fix is merged?

Lei Zhang · Answer 3 · Wed Apr 13 2022 21:05:18 GMT+0800 (China Standard Time)

SGTM, thanks!

Lei Zhang · Answer 4 · Tue Jun 07 2022 07:47:48 GMT+0800 (China Standard Time)

https://reviews.llvm.org/D124685 is landed. Thanks @GeorgeARM! Closing this. Please reopen if you still see issues afterwards.