Why change prompt_token_ids depending on encoder_decoder

Question

Why change prompt_token_ids depending on encoder_decoder

meanwo opened this issue 4 months ago · comments

https://github.com/bentoml/OpenLLM/blob/6eb2ed5028dcaa7e6c7ba60e2ec8dc3377c353be/openllm-python/src/openllm/_runners.py#L181C1-L185

   if self.model.config.is_encoder_decoder:
      max_src_len = context_length
    else:
      max_src_len = context_length - max_new_tokens - 1
    prompt_token_ids = prompt_token_ids[-max_src_len:]

When using the only decoder-based model(llama2) on the hugging face, the prompt_token_ids(input_token_ids) length never changed because of max_new_tokens.

Is there a reason why you set promt_token_ids to change based on max_new_tokens via else syntax??