Is the rejection and adjusting probability implementation different from normal speculative sampling?
AlvL1225 opened this issue · comments
Thanks! The correct approach should be to subtract the two distributions rather than adjust the value of the rejected elements. We have already adjusted the non-greedy code. Since sampling without replacement is performed here, a mask is used to adjust the draft distribution. The rest is consistent with the code in the screenshot you provided.
All the experimental results we provided were under the greedy decoding setting and are not affected.