[Attention] Generalize Attention Tiling and Decomposition

Question

[Attention] Generalize Attention Tiling and Decomposition

Groverkss opened this issue 2 months ago · comments

Attention tiling and decomposition today are hardcoded. It assumes the rank and position of some dimensions. This makes it hard to do fusions with attention. This needs to be fixed by making the tiling and decomposition depend on the indexing maps for attention instead.