TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add multi-staged pipelined GEMM.

haruhi55 opened this issue · comments

Pipelined GEMM generally has a different code structure than naive GEMM, that is related to the hyperparameter: the pipeline stage.

This structure includes a prologue, main loop, and epilogue. Pipelined GEMM uses more shared memory, which limits its tiling policy.