TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add dynamic r2s/s2r copy function.

KuangjuX opened this issue · comments

corresponds to these process:

  1. declare (??) and instantiate the copy plan

auto rA = make_s2rA(sA_ptr, tid, typename KeTraits::SmemLayoutA{}, mma);
auto rB = make_s2rB(sB_ptr, tid, typename KeTraits::SmemLayoutB{}, mma);
auto acc = get_acc<kTM, kTN>(mma);

  1. execute copy in time

for (int i = 0; i < rA.get_iters(); ++i) {
rA.copy(i); // load A register tile from shared memory
rB.copy(i); // load B register tile from shared memory