TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need for clean organization of register tile for Tensor Core output

haruhi55 opened this issue · comments

Currently, to provide a quick implementation, we use enumerated values. However, this approach is insufficient in the long run. The WMMA instruction has critical parameters that affect how data is distributed in memory, such as the output data type and the execution order of multiple WMMA instructions to compute a tile. This information impacts the correct implementation of the store kernel.

https://github.com/TiledTensor/TiledCUDA/blob/master/include/cell/copy/constants.hpp#L16