minyang-chen / llm_fast_inference_from_HF_via_speculative_decoding

evaluate Speculative Decoding that promising 2-3X speedups of LLM inference by running two models in parallel.

minyang-chen/llm_fast_inference_from_HF_via_speculative_decoding Issues

No issues in this repository yet.