Mozer / talk-llama-fast

Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Where should the video output be?

sss1337xyz opened this issue · comments

I can't figure out where the video output should be from, I launched all parts of the application but there was no understanding

I looked at the localhost that open in the console, but they are empty

image

Edit xtts_wav2lip.bat change --output c:\\DATA\\LLM\\SillyTavern-Extras\\tts_out\\ to your real path to tts_out folder. Don't forget the \\ at the end.

UPD: yes, i see missing \\ at your screenshot. A lot of people are missing it, i think i need to handle it in the code better

Yes, thank you, voice files have started to appear in this folder. But I still can't see the video window, and Anna's voice is not playing

image

restart silly extras, maybe it will help. paste errors or screenshot here, let's see what's going on.

I restarted everything and it started working a little differently. After the first request, everything is now frozen (as I understand it, this is about the video cache, which is described in the readme)
Should I just wait?

rtx3070 8gb
32gb ram

(extras) C:\Users\New\Desktop\videoAssistant\xtts\SillyTavern-Extras>python server.py  --enable-modules wav2lip
Using torch device: cpu
Initializing wav2lip module
wav2lip: running init generation with default and silence.wav
in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 0, chunk_needed: 0, reply: 0
speech detected, wav2lip_server won't generate
Deleting old temporary wavs and mp4s.
No API key given because you are running locally.
del tts_out/out_1.wav
del tts_out/out_10.wav
del tts_out/out_11.wav
del tts_out/out_12.wav
del tts_out/out_13.wav
del tts_out/out_2.wav
del tts_out/out_3.wav
del tts_out/out_4.wav
del tts_out/out_5.wav
del tts_out/out_6.wav
 * Serving Flask app 'server'
 * Debug mode: off
del tts_out/out_7.wav
del tts_out/out_8.wav
del tts_out/out_9.wav
del modules/wav2lip/temp/result_1.mp4

Wav2lip videos can be played now.


WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://localhost:5100
Press CTRL+C to quit
127.0.0.1 - - [12/Apr/2024 20:38:26] "GET / HTTP/1.1" 200 -
1712943520.917091 in wav2lip gen server chunk:1_0
in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 1, chunk_needed: 1, reply: 0
  0%|                                                                          | 0/1 [00:00<?, ?it/s]Loading cached faces from file emma_304x300.mp4_2455192.pkl
1712943522.0245404 after face detection, mels: 11, frames: 704, faces: 704, start_frame: 0
Load checkpoint to cuda from: modules/wav2lip/checkpoints/wav2lip.pth
1712943524.816111 in wav2lip gen server chunk:2_1
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 2, chunk_needed: 1, reply: 1
1712943524.869257 in wav2lip gen server chunk:3_1
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 3, chunk_needed: 1, reply: 1
1712943525.0023263 in wav2lip gen server chunk:4_2
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 4, chunk_needed: 1, reply: 2
1712943526.6742308 in wav2lip gen server chunk:5_3
in wav2lip_server_generate: is busy: 1, face_detect_running: 0, chunk: 5, chunk_needed: 1, reply: 3

        OpenH264 Video Codec provided by Cisco Systems, Inc.

Error: skipping some mp4 and setting busy to 0 after timeout. wav2lip was busy: 1, chunk: 2, chunk_needed: 1, step: 300
127.0.0.1 - - [12/Apr/2024 20:39:15] "GET /api/wav2lip/generate/Anna/cuda/out_2/latest/2/1 HTTP/1.1" 200 -
                                                                                                     Loading cached faces from file emma_304x300.mp4_2455192.pkl                     | 0/1 [00:00<?, ?it/s]
1712943555.1464937 after face detection, mels: 11, frames: 704, faces: 704, start_frame: 11
Error: skipping some mp4 and setting busy to 0 after timeout. wav2lip was busy: 1, chunk: 4, chunk_needed: 1, step: 300
127.0.0.1 - - [12/Apr/2024 20:39:15] "GET /api/wav2lip/generate/Anna/cuda/out_4/latest/4/2 HTTP/1.1" 200 -
                                                                                                     Loading cached faces from file emma_304x300.mp4_2455192.pkl
1712943556.8251662 after face detection, mels: 13, frames: 704, faces: 704, start_frame: 22<?, ?it/s]

I restarted it again, it started responding, but after 1 request everything hangs the same as above
image

log:

1712944352.3349257 calling play_video_with_audio: modules/wav2lip/temp/result_6.mp4, next_video_chunk_global: 6
1712944352.3349257 in play_video_with_audio, video: modules/wav2lip/temp/result_6.mp4, current_video_chunk: 6
  0%|                                                                          | 0/1 [00:00<?, ?it/s]Loading cached faces from file emma_304x300.mp4_2455192.pkl
1712944352.3842294 after face detection, mels: 21, frames: 704, faces: 704, start_frame: 11
cv2: missing video frame
Notice: modules/wav2lip/temp/result_7.mp4 is not found, nothing to play as a video
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_7.mp4 (doesn't exist yet)
Notice: modules/wav2lip/temp/result_8.mp4 is not found, nothing to play as a video
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_8.mp4 (doesn't exist yet)
Notice: modules/wav2lip/temp/result_9.mp4 is not found, nothing to play as a video
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
Notice: cv2 is not opened for modules/wav2lip/temp/result_9.mp4 (doesn't exist yet)
done with play_video_with_audio, latest next_video_chunk_global to play: 10
Error: skipping some mp4 and setting busy to 0 after timeout. wav2lip was busy: 1, chunk: 8, chunk_needed: 7, step: 300
127.0.0.1 - - [12/Apr/2024 20:53:01] "GET /api/wav2lip/generate/Anna/cuda/out_8/latest/8/1 HTTP/1.1" 200 -
                                                                                                     Loading cached faces from file emma_304x300.mp4_2455192.pkl                     | 0/1 [00:00<?, ?it/s]
1712944382.002395 after face detection, mels: 8, frames: 704, faces: 704, start_frame: 32

I think you are out of vram and it is trying to use system ram. Check in Windows Task manager how much VRAM is in use. Try smaller mistral, example https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q2_K.gguf if that works, then you can try a little bigger quant. Also don't use large whisper (use medium whisper).

I'm fine, it only works acceptably with ggml-small-q5_1.bin
mistral-7b-instruct-v0.2.Q2_K.guf
, but small ml is too stupid, ehh, okay.
also, a lot of frames are skipped during video processing

Thank you very much for your help

Try some other LLM, for example Gemma 2B in English.

UPDATE: instead of using smaller LLM you can just use CPU+RAM instead of GPU for mistral. change -ngl 0 in talk_llama_wav2lip_ru.bat. Latency will be 3-4 seconds, but it is still fine. If that works, you can try different -ngl (from 0 to 33, there are 33 layers in mistral). I have tried talking with Mixtral-8x7B working just on CPU and it was almost OK.