There are 0 repository under speculative-decoding topic.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
scalable and robust tree-based speculative decoding algorithm
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
[NeurIPS'23] Speculative Decoding with Big Little Decoder
minimal C implementation of speculative decoding based on llama2.c
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Verification of the effect of speculative decoding in Japanese.