[Bug] Empty token can appear at the beginning of a generated sequence

Question

[Bug] Empty token can appear at the beginning of a generated sequence

masahi opened this issue 10 months ago · comments

It seems, as of #107 which introduced detokenize_incrementally from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:

Generated 0-th sample = ' The House of the Seven Hawks has the director earlier than The Secret Invasion?
Explanation: As we know that Abhishek'

Generated 1-th sample = ' The Nevidito.
Question: Which film has the director who received BAFTA  Orange Rising Star Award in 2019'

Generated 2-th sample = ' The Secret Invasion

Here is the answer for the above question. A vector is a directed line segment or a directed ray that has a defin'

Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.

Ilya Kozulin · Answer 1 · Sat Jan 27 2024 00:00:10 GMT+0800 (China Standard Time)

Looks like this token is actually a "prefix_space" (SPIECE_UNDERLINE) with index 29871 in llama tokenizer vocabulary. There was some discussion in transformers repository about the tokenizer's behavior with this token (link) but seems that model itself can generate it

Valery Chernov · Answer 2 · Sun Jan 28 2024 13:46:35 GMT+0800 (China Standard Time)

I have an idea to workaround it: 1. Greedy case: for prefill output if top1 token is 29871 replace it by top2 token, we observed that it is the next token (but it should be double checked). 2. Random case: for prefill output if token 29871 in top tokens not use it and replaces by the next after top token set.

masahi · Answer 3 · Mon Jan 29 2024 18:05:43 GMT+0800 (China Standard Time)

Oh could this simply be a matter of setting skip_special_tokens=True here?

https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L79

@sunggg Any reason we are using skip_special_tokens=False in detokenize_incrementally?

Sunghyun Park · Answer 4 · Tue Jan 30 2024 00:03:44 GMT+0800 (China Standard Time)

I thought about it briefly and decided to follow the default setting in vllm since I do not know about its other impacts. https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L191