Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip

error: unknown argument: --vad-start-thold

alrostami opened this issue · comments

  • branch: master
  • OS: Ubuntu Server 22.04
  • Compilers: gcc 11.4.0 and g++ 11.4.0

I tried compiling your code. I was able to generate talk-llama, downloaded your script talk-llama.bat, but when I run it I get the following error continued by instructions on how to use talk-llama :

error: unknown argument: --vad-start-thold

usage: ./talk-llama [options]

  -h,       --help           [default] show this help message and exit
  -t N,     --threads N      [4      ] number of threads to use during computation
  -vms N,   --voice-ms N     [10000  ] voice duration in milliseconds
  -c ID,    --capture ID     [-1     ] capture device ID
  -mt N,    --max-tokens N   [32     ] maximum number of tokens per audio chunk
  -ac N,    --audio-ctx N    [0      ] audio context size (0 - all)
  -ngl N,   --n-gpu-layers N [999    ] number of layers to store in VRAM
  -vth N,   --vad-thold N    [0.60   ] voice activity detection threshold
  -vlm N,   --vad-last-ms N  [500    ] vad min silence after speech, ms
  -fth N,   --freq-thold N   [100.00 ] high-pass frequency cutoff
  -su,      --speed-up       [false  ] speed up audio by x2 (reduced accuracy)
  -tr,      --translate      [false  ] translate from source language to english
  -ps,      --print-special  [false  ] print special tokens
  -pe,      --print-energy   [false  ] print sound energy (for debugging)
  -vp,      --verbose-prompt [false  ] print prompt at start
  -ng,      --no-gpu         [false  ] disable GPU
  -p NAME,  --person NAME    [Alex   ] person name (for prompt selection)
  -bn NAME, --bot-name NAME  [LLaMA  ] bot name (to display)
  -w TEXT,  --wake-command T [       ] wake-up command to listen for
  -ho TEXT, --heard-ok TEXT  [       ] said by TTS before generating reply
  -l LANG,  --language LANG  [en     ] spoken language
  -mw FILE, --model-whisper  [./ggml-medium.en-q5_0.bin] whisper model file
  -ml FILE, --model-llama    [./mistral-7b-instruct-v0.2.Q6_K.gguf] llama model file
  -s FILE,  --speak TEXT     [speak  ] command for TTS
  --prompt-file FNAME        [       ] file with custom prompt to start dialog
  --session FNAME                   file to cache model state in (may be large!) (default: none)
  -f FNAME, --file FNAME     [       ] text output file name
   --ctx_size N              [2048   ] Size of the prompt context
  -n N, --n_predict N        [64     ] Number of tokens to predict
  --temp N                   [0.90   ] Temperature 
  --top_k N                  [40.00  ] top_k 
  --top_p N                  [1.00   ] top_p 
  --repeat_penalty N         [1.10   ] repeat_penalty 
  --xtts-voice NAME          [emma_1 ] xtts voice without .wav
  --xtts-url TEXT            [http://localhost:8020/] xtts/silero server URL, with trailing slash
  --xtts-control-path FNAME  [./talk-llama-fast/xtts/xtts_play_allowed.txt] path to xtts_play_allowed.txt  --google-url TEXT          [http://localhost:8003/] langchain google-serper server URL, with /

Have you used a branch that you haven't pushed yet to build the demo version? If not, can you tell me what is it that I am missing?

oops, i forgot to upload recent code changes. Now fixed. Just git pull or manually download

Thanks for the quick reply, but both of the commits you have pushed show zero lines change since 5db57b9

There is clearly --vad-start-thold now in
Before that it wasn't there. Maybe some weird cache you have. Anyway you can just remove that param from bat/shell file.

I can see them all now. Thanks!