iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better heuristic in getDefaultDistributedLoopTileSizes

pzread opened this issue · comments

Currently getDefaultDistributedLoopTileSizes will produces non-divisible tiling sizes and relies on getMaxTileSize to find the closest divisible sizes. However, sometimes it generates a non-ideal tiling size as the shape size is divided by 2 on the workgroup size. We want to divide the size by 2 as later as possible to make sure that the inner tile size can be the multiplier of the vector size.

For example, on the feature dim of a depthwise_conv2d in MobileNetV3, it is tiled as:

  • Shape: 240
  • Workgroup: 60
  • Inner tiling: 30

It can perform better if we tile it as:

  • Shape: 240
  • Workgroup: 48
  • Inner tiling: 16

This can be done by aggressively keeping the factor number 2 during the search of workgroup size and doing the best-effort to make the result size divisible (so getMaxTileSize won't kick in).