evaluate Speculative Decoding that promising 2-3X speedups of LLM inference by running two models in parallel.
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool