EricLBuehler / candle-vllm

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

--repeat-last-n option not mentioned in the usage help

ivanbaldo opened this issue · comments

The mandatory --repeat-last-n option isn't documented on the CLI usage message, like for example running without parameters or using the --help option.

Can you paste your CLI usage message? This is likely caused because the ModelSelected command is separate from the outer args, as such cargo run -- --port 2000 llama7b --help should help.

root@4be76ca848a1:/candle-vllm# /root/.cargo/bin/candle-vllm
Usage: candle-vllm [OPTIONS] --port <PORT> <COMMAND>

Commands:
  llama7b   Select the llama7b model
  llama13b  Select the llama13b model
  llama70b  Select the llama70b model
  help      Print this message or the help of the given subcommand(s)

Options:
      --hf-token <HF_TOKEN>            Huggingface token environment variable (optional). If not specified, load using hf_token_path
      --hf-token-path <HF_TOKEN_PATH>  Huggingface token file (optional). If neither `hf_token` or `hf_token_path` are specified this is used with the value of `~/.cache/huggingface/token`
      --port <PORT>                    Port to serve on (localhost:port)
      --verbose                        Set verbose mode (print all requests)
      --max-num-seqs <MAX_NUM_SEQS>    Maximum number of sequences to allow [default: 256]
      --block-size <BLOCK_SIZE>        Size of a block [default: 16]
  -h, --help                           Print help
  -V, --version                        Print version
root@4be76ca848a1:/candle-vllm#
root@4be76ca848a1:/candle-vllm# /root/.cargo/bin/candle-vllm llama7b
error: the following required arguments were not provided:
  --repeat-last-n <REPEAT_LAST_N>

Usage: candle-vllm --port <PORT> llama7b --repeat-last-n <REPEAT_LAST_N>

For more information, try '--help'.
root@4be76ca848a1:/candle-vllm#

Now I understand: the options change depending on the model selected.

But then maybe the main Usage help could guide the user that he should also consult the model specific options with candle-vllm <modelName> --help too.

And the llama7b usage help could maybe explain what the --repeat-last-n option means.

(btw I am just reporting this just in case, it may very well be low priority for the project at this time)

I just pushed a commit that adds:

  • Information to the README to point the user towards model-specific information
  • Explanation of -repeat-last-n

Is this sufficient to close the issue?

Yeah of course!!! Thanks!!!
P.s.: isn't there a mostly suitable default value for that option? like for example 64?

Yes, there should be and I will likely add it later.