TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement the macro kernel that stores data from register to shared memory.

haruhi55 opened this issue · comments

The macro kernel should able to reverse tensor core's special output layout.