TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`TileShape` is insufficient to fully describe a copy plan.

haruhi55 opened this issue · comments

using TemporalExecShared = TileShape<2, 1>;
using WarpLayout = TileShape<2, 2>;
using ThreadLayout = TileShape<16, 2>; // fixed when using ldmatrix.
using ElemDataTileShared = TileShape<2, 16>;
// the final copy plan for accessing shared memory
using Shared = SharedTile<Element, TemporalExecShared, WarpLayout,

In the code snippet provided, a layout such as row-major or column-major is required to fully describe the copy plan. Currently, all existing implementations default to row-major.