Enhance the unit tests for storing Tensor Core's WMMA output tile.
haruhi55 opened this issue · comments
The current unit tests only verify the use of a single warp to store the results of the ldmatrix
.
However, since the outputs of the WMMA instruction have varying data types that occupy different widths, the store operation needs to be aware of the output's data type to enable vectorized storing.