MaterializeEncoding for conv on CPU with Winograd
bjacob opened this issue · comments
Compared to #17606, this is a bigger, researchy project.
This is about making Winograd, which we typically want to use as the implementation strategy for Conv ops on CPU, happen as MaterializeEncoding, instead of having to be an early (global opt / preprocessing) step at the moment.
The Materialization should look like this:
set_encoding
op materialized intowinograd_(input/filter)_transform
+tensor.pack
.conv
op materialized intobatch_mmt4d
unset_encoding
materialized intotensor.unpack
+winograd_output_transform
.
This project probably depends on the ability to get local workgroup memory allocated within a dispatch (without hitting the stack). Indeed, producers may get fused into set_encoding
at Flow level, long before it becomes known that the set_encoding
will materialize into a winograd_input_transform
which performs a matmul and thus consumes each input element multiple times. The only way to avoid redundant evaluation of a fused producer will be to allocate an intermediate buffer.