Optimised GPU kernels
bclarkson-code opened this issue · comments
bclarkson-code commented
Andrej Karpathy has just upstaged me released llm.c which contains some highly optimised CUDA kernels. If we include these into tricycle, we can probably get a significant performance boost for operations like attention.