Add multi-staged pipelined GEMM.
haruhi55 opened this issue · comments
Pipelined GEMM generally has a different code structure than naive GEMM, that is related to the hyperparameter: the pipeline stage.
This structure includes a prologue, main loop, and epilogue. Pipelined GEMM uses more shared memory, which limits its tiling policy.