TiledTensor/TiledCUDA Issues
Provide a complete GEMM example.
Updated 2Add test for Lstm Cell.
ClosedAdd BatchedGEMM kernel.
ClosedAdd LstmCell kernel.
Closed
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.