minyang-chen / llm_fast_inference_from_HF_via_speculative_decoding

evaluate Speculative Decoding that promising 2-3X speedups of LLM inference by running two models in parallel.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

minyang-chen/llm_fast_inference_from_HF_via_speculative_decoding Issues

No issues in this repository yet.