Is the rejection and adjusting probability implementation different from normal speculative sampling?

Question

Is the rejection and adjusting probability implementation different from normal speculative sampling?

AlvL1225 opened this issue 6 months ago · comments

However, in other implementations: like GPT-fast, or lucidrains implementation, the probability (GTP - Q )should be subtracted elementwisely but not only the rejected element?

yuhuili · Answer 1 · Fri Dec 29 2023 02:43:50 GMT+0800 (China Standard Time)

Thanks! The correct approach should be to subtract the two distributions rather than adjust the value of the rejected elements. We have already adjusted the non-greedy code. Since sampling without replacement is performed here, a mask is used to adjust the draft distribution. The rest is consistent with the code in the screenshot you provided.

All the experimental results we provided were under the greedy decoding setting and are not affected.