punica-ai / punica

Serving multiple LoRA finetuned LLM as one

Home Page:https://arxiv.org/abs/2310.18547

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BGMV performs better than SGMV?

opened this issue · comments

I benchmarked various kernels on the A100 using the benchmark script, and it seems that the BGMV kernel outperforms the SGMV kernels for individual requests (bgmv senario). Is this expected?

Screenshot 2024-01-31 at 4 28 27 PM

Hi @jsheng-jian , thanks for doing the benchmark and yes it's somewhat expected considering the current SGMV implementation is not optimized for individual requests. A better implementation of SGMV (we are integrating them into flashinfer) may have a similar performance to bgmv but I don't expect sgmv would be faster in this case.