TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enhancing shared memory access for 2D warp organization

haruhi55 opened this issue · comments

Support that in the 2D grid organization of warps and shared memory data tiles, warps within the same row/column load data tiles located in the same row/column.