google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TODO (**Optimize, potentially using new VQSort PartialSort**)

enum-class opened this issue · comments

I have a question about this TODO (Optimize, potentially using new VQSort PartialSort) in here:
I want to do it but I'm struggling to find a clean solution. Can you help me out?

Initially, it seems VQSelect is just enough since create_distribution doesn't need sorted probabilities.

One idea is to create an array of key-value pairs (something like K32V32) from the probabilities and their indexes, then apply VQSelect and pass the first 'k' elements to 'create_distribution'. But this involves allocating and copying a potentially large probabilities array and requires a special structure for comparison, something like OrderDescendingKV64.

Another idea is to create a special version of VQSelect just for this case.

Or simply leave the code as it is. What do you think?

Thanks for considering this! I think it's fairly low on the profile, so let's focus on other things first, in particular the prefill batching and matmul. I'm working on a plan for those and will post an issue soon with a proposed roadmap :)