TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A buggy implementation of the TileIterator.

haruhi55 opened this issue · comments

Since we do not differentiate between GlobalTile and SharedTile, a TileIterator should be able to work with both types. However, the current implementation is tightly coupled with SharedTile, which is a bug as shown below

using NewTile = SharedTile<DType, TileLayout>;

Add a template parameter to represent different Tile types?

A possible solution could be to have GlobalTile and SharedTile inherit from a common base class. Currently, since GlobalTile and SharedTile exhibit no differences in behavior, the computation results are correct.

I will carefully consider a suitable solution in next modifications.