MaterializeEncoding for conv on CPU with Winograd

Question

MaterializeEncoding for conv on CPU with Winograd

bjacob opened this issue 24 days ago · comments

Compared to #17606, this is a bigger, researchy project.

This is about making Winograd, which we typically want to use as the implementation strategy for Conv ops on CPU, happen as MaterializeEncoding, instead of having to be an early (global opt / preprocessing) step at the moment.

The Materialization should look like this:

set_encoding op materialized into winograd_(input/filter)_transform + tensor.pack.
conv op materialized into batch_mmt4d
unset_encoding materialized into tensor.unpack + winograd_output_transform.

This project probably depends on the ability to get local workgroup memory allocated within a dispatch (without hitting the stack). Indeed, producers may get fused into set_encoding at Flow level, long before it becomes known that the set_encoding will materialize into a winograd_input_transform which performs a matmul and thus consumes each input element multiple times. The only way to avoid redundant evaluation of a fused producer will be to allocate an intermediate buffer.