[Bug] Empty token can appear at the beginning of a generated sequence
masahi opened this issue · comments
It seems, as of #107 which introduced detokenize_incrementally
from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:
Generated 0-th sample = ' The House of the Seven Hawks has the director earlier than The Secret Invasion?
Explanation: As we know that Abhishek'
Generated 1-th sample = ' The Nevidito.
Question: Which film has the director who received BAFTA Orange Rising Star Award in 2019'
Generated 2-th sample = ' The Secret Invasion
Here is the answer for the above question. A vector is a directed line segment or a directed ray that has a defin'
Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.
Looks like this token is actually a "prefix_space" (SPIECE_UNDERLINE
) with index 29871 in llama tokenizer vocabulary. There was some discussion in transformers repository about the tokenizer's behavior with this token (link) but seems that model itself can generate it
I have an idea to workaround it: 1. Greedy case: for prefill output if top1 token is 29871 replace it by top2 token, we observed that it is the next token (but it should be double checked). 2. Random case: for prefill output if token 29871 in top tokens not use it and replaces by the next after top token set.
Oh could this simply be a matter of setting skip_special_tokens=True
here?
https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L79
@sunggg Any reason we are using skip_special_tokens=False
in detokenize_incrementally
?
I thought about it briefly and decided to follow the default setting in vllm since I do not know about its other impacts. https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L191