Use a MatMul implementation over MatVec for Prefill Computations

Question

Use a MatMul implementation over MatVec for Prefill Computations

austinvhuang opened this issue 5 months ago · comments

Call for contributions for anyone interested in taking this on (@jan-wassenberg feel free to tag anyone who might be interested). The Prefill() computation is setup to allow batched computation (currently statically sized as kPrefillBatchSize).

Some pointers:

Activations type is templated by batch size with this in mind, so to a first approx, this can be done by replacing MatVec operations with a Matmul for the Activation data that is batched for kPrefillBatchSize > 1
Prefill calls FFW() and Attention(), so the implementation changes are probably happen there. Since kBatchSize is known at comptime, this could probably even be done with if constexpr
As a first step, might start with trying just with the FFW() and assess performance differences since there's less implementation complexity to deal with.

Jan Wassenberg · Answer 1 · Tue Apr 30 2024 22:27:46 GMT+0800 (China Standard Time)

Thanks! @pculliton @samkaufman FYI.
We'll soon have a basic MatMul to test with.

Jan Wassenberg · Answer 2 · Tue Apr 30 2024 22:42:54 GMT+0800 (China Standard Time)

Related reading: https://siboehm.com/articles/22/Fast-MMM-on-CPU
Which links to https://marek.ai/matrix-multiplication-on-cpu.html and https://github.com/flame/how-to-optimize-gemm/ (from the BLIS group).

Jan Wassenberg · Answer 3 · Fri Jul 12 2024 00:03:37 GMT+0800 (China Standard Time)

This is now done :D