bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Home Page:https://bentoml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why change prompt_token_ids depending on encoder_decoder

meanwo opened this issue · comments

https://github.com/bentoml/OpenLLM/blob/6eb2ed5028dcaa7e6c7ba60e2ec8dc3377c353be/openllm-python/src/openllm/_runners.py#L181C1-L185

   if self.model.config.is_encoder_decoder:
      max_src_len = context_length
    else:
      max_src_len = context_length - max_new_tokens - 1
    prompt_token_ids = prompt_token_ids[-max_src_len:]

When using the only decoder-based model(llama2) on the hugging face, the prompt_token_ids(input_token_ids) length never changed because of max_new_tokens.

Is there a reason why you set promt_token_ids to change based on max_new_tokens via else syntax??