octoml / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Home Page:https://mlc.ai/mlc-llm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Empty token can appear at the beginning of a generated sequence

masahi opened this issue · comments

It seems, as of #107 which introduced detokenize_incrementally from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:

Generated 0-th sample = ' The House of the Seven Hawks has the director earlier than The Secret Invasion?
Explanation: As we know that Abhishek'

Generated 1-th sample = ' The Nevidito.
Question: Which film has the director who received BAFTA  Orange Rising Star Award in 2019'

Generated 2-th sample = ' The Secret Invasion

Here is the answer for the above question. A vector is a directed line segment or a directed ray that has a defin'

Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.

Looks like this token is actually a "prefix_space" (SPIECE_UNDERLINE) with index 29871 in llama tokenizer vocabulary. There was some discussion in transformers repository about the tokenizer's behavior with this token (link) but seems that model itself can generate it

I have an idea to workaround it: 1. Greedy case: for prefill output if top1 token is 29871 replace it by top2 token, we observed that it is the next token (but it should be double checked). 2. Random case: for prefill output if token 29871 in top tokens not use it and replaces by the next after top token set.

Oh could this simply be a matter of setting skip_special_tokens=True here?

https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L79

@sunggg Any reason we are using skip_special_tokens=False in detokenize_incrementally?

I thought about it briefly and decided to follow the default setting in vllm since I do not know about its other impacts. https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L191