iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dequantization + Extract Slice Problems

IanWood1 opened this issue · comments

#17455 uncovered several problems when dealing with tensor.extract_slice that consume the results of dequantization-like linalg.generic ops. See this gist for an mlir example.

What should happen

The dequantization + ExtactSliceOp + consumer will be placed in the same dispatch AND bufferization will convert the ExtractSliceOp into a view instead of allocating an entirely new high bitwidth tensor.

dequantization + ExtactSliceOp + consumer
-> (make slice continuous via transpose)
dequantization + transpose + ExtactSliceOp + consumer
-> (move transpose before dequant)
transpose + dequantization + ExtactSliceOp + consumer
-> dequant and extractsliceop cloned into consumer

What is currently happening

Although dequantization + ExtactSliceOp + consumer gets placed in the same dispatch, bufferiation cannot handle the tensor.extract_slice since it is extracting on the innermost dim. Full logs

Problems

  • #17574 Transpose the extract_slice to extract along the outermost dimension so that bufferization can handle the extract better. However, the new transpose causes the dequant to not be cloned (the dequant and transpose get fused together and not cloned).
  • Bubble the transpose above dequantization ops so that they don't get in the way of dequant + extract getting cloned into dispatches
  • The transpose gets fused with the dequant op, which prevents cloning the dequantization into the dispatch IR example
    1. Prevent the transpose from being fused with the dequant
    2. OR propagate the slice before the dequant (hard with multiple slices)
  • tensor.extract_slice ops shouldn't be unconditionally cloned into dispatch regions. This PR looks at only cloning when result is continuous #17638