Multi-Candidate Speculative Decoding

Code Release

Data Release

For Alpaca dataset, we use exactly the same exact source as SpecInfer.

For the WMT dataset, we follow the process of SpecInfer: randomly sampling 1000 samples from the test set. We wrap the source sentences using the following template:

Translate the input English sentence into German.
Input: {source sentence}
Output:

Model Release

We release our fine-tuned draft models on hugginface, see Vicuna-68M and Vicuna-160M. They are fine-tuned from LLaMA-68M and LLaMA-160M respectively on ShareGPT data. The training setup follows FastChat.

About

Multi-Candidate Speculative Decoding

MIT License

Languages

Language:Python 99.8%Language:Dockerfile 0.2%