[Attention] Generalize Attention Tiling and Decomposition
Groverkss opened this issue · comments
Attention tiling and decomposition today are hardcoded. It assumes the rank and position of some dimensions. This makes it hard to do fusions with attention. This needs to be fixed by making the tiling and decomposition depend on the indexing maps for attention instead.