bclarkson-code / Tricycle

Deep learning framework completely from scratch in python + numpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimised GPU kernels

bclarkson-code opened this issue · comments

Andrej Karpathy has just upstaged me released llm.c which contains some highly optimised CUDA kernels. If we include these into tricycle, we can probably get a significant performance boost for operations like attention.