Mozer / talk-llama-fast

Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Where should the video output be?

sss1337xyz opened this issue · comments

I can't figure out where the video output should be from, I launched all parts of the application but there was no understanding

I looked at the localhost that open in the console, but they are empty


Edit xtts_wav2lip.bat change --output c:\\DATA\\LLM\\SillyTavern-Extras\\tts_out\\ to your real path to tts_out folder. Don't forget the \\ at the end.

UPD: yes, i see missing \\ at your screenshot. A lot of people are missing it, i think i need to handle it in the code better

Yes, thank you, voice files have started to appear in this folder. But I still can't see the video window, and Anna's voice is not playing


restart silly extras, maybe it will help. paste errors or screenshot here, let's see what's going on.

I restarted everything and it started working a little differently. After the first request, everything is now frozen (as I understand it, this is about the video cache, which is described in the readme)
Should I just wait?

rtx3070 8gb
32gb ram

(extras) C:\Users\New\Desktop\videoAssistant\xtts\SillyTavern-Extras>python  --enable-modules wav2lip
Using torch device: cpu
Initializing wav2lip module
wav2lip: running init generation with default and silence.wav
in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 0, chunk_needed: 0, reply: 0
speech detected, wav2lip_server won't generate
Deleting old temporary wavs and mp4s.
No API key given because you are running locally.
del tts_out/out_1.wav
del tts_out/out_10.wav
del tts_out/out_11.wav
del tts_out/out_12.wav
del tts_out/out_13.wav
del tts_out/out_2.wav
del tts_out/out_3.wav
del tts_out/out_4.wav
del tts_out/out_5.wav
del tts_out/out_6.wav
 * Serving Flask app 'server'
 * Debug mode: off
del tts_out/out_7.wav
del tts_out/out_8.wav
del tts_out/out_9.wav
del modules/wav2lip/temp/result_1.mp4

Wav2lip videos can be played now.

WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://localhost:5100
Press CTRL+C to quit - - [12/Apr/2024 20:38:26] "GET / HTTP/1.1" 200 -
1712943520.917091 in wav2lip gen server chunk:1_0
in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 1, chunk_needed: 1, reply: 0
  0%|                                                                          | 0/1 [00:00<?, ?it/s]Loading cached faces from file emma_304x300.mp4_2455192.pkl
1712943522.0245404 after face detection, mels: 11, frames: 704, faces: 704, start_frame: 0
Load checkpoint to cuda from: modules/wav2lip/checkpoints/wav2lip.pth
1712943524.816111 in wav2lip gen server chunk:2_1
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 2, chunk_needed: 1, reply: 1
1712943524.869257 in wav2lip gen server chunk:3_1
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 3, chunk_needed: 1, reply: 1
1712943525.0023263 in wav2lip gen server chunk:4_2
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 4, chunk_needed: 1, reply: 2
1712943526.6742308 in wav2lip gen server chunk:5_3
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 5, chunk_needed: 1, reply: 3

        OpenH264 Video Codec provided by Cisco Systems, Inc.

Error: skipping some mp4 and setting busy to 0 after timeout. wav2lip was busy: 1, chunk: 2, chunk_needed: 1, step: 300 - - [12/Apr/2024 20:39:15] "GET /api/wav2lip/generate/Anna/cuda/out_2/latest/2/1 HTTP/1.1" 200 -
                                                                                                     Loading cached faces from file emma_304x300.mp4_2455192.pkl                     | 0/1 [00:00<?, ?it/s]
1712943555.1464937 after face detection, mels: 11, frames: 704, faces: 704, start_frame: 11
Error: skipping some mp4 and setting busy to 0 after timeout. wav2lip was busy: 1, chunk: 4, chunk_needed: 1, step: 300 - - [12/Apr/2024 20:39:15] "GET /api/wav2lip/generate/Anna/cuda/out_4/latest/4/2 HTTP/1.1" 200 -
                                                                                                     Loading cached faces from file emma_304x300.mp4_2455192.pkl
1712943556.8251662 after face detection, mels: 13, frames: 704, faces: 704, start_frame: 22<?, ?it/s]

I restarted it again, it started responding, but after 1 request everything hangs the same as above


1712944352.3349257 calling play_video_with_audio: modules/wav2lip/temp/result_6.mp4, next_video_chunk_global: 6
1712944352.3349257 in play_video_with_audio, video: modules/wav2lip/temp/result_6.mp4, current_video_chunk: 6
  0%|                                                                          | 0/1 [00:00<?, ?it/s]Loading cached faces from file emma_304x300.mp4_2455192.pkl
1712944352.3842294 after face detection, mels: 21, frames: 704, faces: 704, start_frame: 11
cv2: missing video frame
Notice: modules/wav2lip/temp/result_7.mp4 is not found, nothing to play as a video
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: modules/wav2lip/temp/result_8.mp4 is not found, nothing to play as a video
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: modules/wav2lip/temp/result_9.mp4 is not found, nothing to play as a video
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
done with play_video_with_audio, latest next_video_chunk_global to play: 10
Error: skipping some mp4 and setting busy to 0 after timeout. wav2lip was busy: 1, chunk: 8, chunk_needed: 7, step: 300 - - [12/Apr/2024 20:53:01] "GET /api/wav2lip/generate/Anna/cuda/out_8/latest/8/1 HTTP/1.1" 200 -
                                                                                                     Loading cached faces from file emma_304x300.mp4_2455192.pkl                     | 0/1 [00:00<?, ?it/s]
1712944382.002395 after face detection, mels: 8, frames: 704, faces: 704, start_frame: 32

I think you are out of vram and it is trying to use system ram. Check in Windows Task manager how much VRAM is in use. Try smaller mistral, example if that works, then you can try a little bigger quant. Also don't use large whisper (use medium whisper).

I'm fine, it only works acceptably with ggml-small-q5_1.bin
, but small ml is too stupid, ehh, okay.
also, a lot of frames are skipped during video processing

Thank you very much for your help

Try some other LLM, for example Gemma 2B in English.

UPDATE: instead of using smaller LLM you can just use CPU+RAM instead of GPU for mistral. change -ngl 0 in talk_llama_wav2lip_ru.bat. Latency will be 3-4 seconds, but it is still fine. If that works, you can try different -ngl (from 0 to 33, there are 33 layers in mistral). I have tried talking with Mixtral-8x7B working just on CPU and it was almost OK.