Add dynamic r2s/s2r copy function.
KuangjuX opened this issue · comments
corresponds to these process:
- declare (??) and instantiate the copy plan
TiledCUDA/src/kernels/cute_gemm.cu
Lines 50 to 52 in 8205e7c
- execute copy in time
TiledCUDA/src/kernels/cute_gemm.cu
Lines 64 to 66 in 8205e7c