nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wizard Coder 15B: Max Input Token size and output token size

karrtikiyer opened this issue · comments

Hi,
For Wizard Coder 15B
I would like to understand:

  1. What is the maximum input token size for the wizard coder 15B?
  2. Similarly what is the max output token size?
  3. In cases where want to make use of this model to say review code across multiple files which might be dependent (one file calling function from another), how to tokenize such code before asking WizardCoder to summarise it as it may exceed the token size limit?

WizardCoder is based on StarCoder. The max length is 8192 (input + output).

Thanks @ChiYeungLaw , would you have view or any advise on the point 3 in my list of questions above?

Maybe you can try some hierarchical methods.

  1. Review each file -> summary of each file
  2. Combine them to get the final review

Or you can try some retrieval methods.
But I cannot guarantee these suggestions work.