Question on draft process

Question

Question on draft process

cyLi-Tiger opened this issue 3 months ago · comments

I had a question on the sampling strategy during building the token tree. I noticed here you sampled the next layer using torch.multinomial, why not use torch.topk instead? Cause it seems that choosing the top k likely selected tokens are more reasonable. And SpecInfer also use the top k candidates to expand a token tree.

yuhuili · Answer 1 · Tue Apr 02 2024 00:30:35 GMT+0800 (China Standard Time)

As long as the sampling probabilities are recorded, both sampling methods are correct. This is because speculative sampling does not have any requirements for the draft distribution q. Top-k sampling is equivalent to repeating argmax multiple times. When the draft distribution q and the target distribution p are highly consistent, sampling is better than argmax. For example, if p=(0.6,0.4) and q=(0.6,0.4), the acceptance rate of sampling from q is 1. When using argmax, it is equivalent to sampling from the distribution (1.0,0.0), and the acceptance rate is 0.6.