TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enhance the unit tests for storing Tensor Core's WMMA output tile.

haruhi55 opened this issue · comments

The current unit tests only verify the use of a single warp to store the results of the ldmatrix.

However, since the outputs of the WMMA instruction have varying data types that occupy different widths, the store operation needs to be aware of the output's data type to enable vectorized storing.