Use a MatMul implementation over MatVec for Prefill Computations
austinvhuang opened this issue · comments
Austin Huang commented
Call for contributions for anyone interested in taking this on (@jan-wassenberg feel free to tag anyone who might be interested). The Prefill() computation is setup to allow batched computation (currently statically sized as kPrefillBatchSize
).
Some pointers:
- Activations type is templated by batch size with this in mind, so to a first approx, this can be done by replacing MatVec operations with a Matmul for the Activation data that is batched for
kPrefillBatchSize > 1
- Prefill calls FFW() and Attention(), so the implementation changes are probably happen there. Since
kBatchSize
is known at comptime, this could probably even be done withif constexpr
- As a first step, might start with trying just with the
FFW()
and assess performance differences since there's less implementation complexity to deal with.
Jan Wassenberg commented
Thanks! @pculliton @samkaufman FYI.
We'll soon have a basic MatMul to test with.
Jan Wassenberg commented
Related reading: https://siboehm.com/articles/22/Fast-MMM-on-CPU
Which links to https://marek.ai/matrix-multiplication-on-cpu.html and https://github.com/flame/how-to-optimize-gemm/ (from the BLIS group).
Jan Wassenberg commented
This is now done :D