karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

ThunderKittens Backend

AndreSlavescu opened this issue · comments

Would it be an idea to define the same kernels that exist in the CUDA backend with ThunderKittens as well? They have cool examples with FlashAttention2 and I think it would be interesting to have as an educational resource as well. Thoughts?

Yes! I'm planning to look into using ThunderKittens once I've got more time (probably 2nd week of June). I'm not sure there's much point using it for kernels that don't use the tensor core though? But it might allow fusing even more things together (e.g. matmul and fused classifier maybe)

My plan was to mostly focus on making a hyper-optimised path for H100 using TMA though... But we'll see what happens :)