Mozer / talk-llama-fast

Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No audio or video.

Gnoomer opened this issue · comments

I launched talk-llama-wav2lip-ru.bat, and only text output worked, i tried reloading SillyTavern and xtts and this doesn't seem to help, it says there are no speakers.
Help, please.
I use Windows 10 on PC with 4070ti and 16gb ram.
Here is output of xtts:
(xtts) C:\Windows\system32>python -m xtts_api_server --bat-dir %~dp0 -d=cuda --deepspeed --stream-to-wavs --call-wav2lip --output C:\Windows\System32\SillyTavern-Extras\tts_out\ --extras-url http://127.0.0.1:5100/ --wav-chunk-sizes=10,20,40,100,200,300,400,9999
2024-04-15 13:57:48.282 | INFO | xtts_api_server.modeldownloader:upgrade_tts_package:80 - TTS will be using 0.22.0 by Mozer
2024-04-15 13:57:48.283 | INFO | xtts_api_server.server::76 - Model: 'v2.0.2' starts to load,wait until it loads
[2024-04-15 13:58:01,165] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-15 13:58:01,457] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-04-15 13:58:01,647] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+unknown, git-hash=unknown, git-branch=unknown
[2024-04-15 13:58:01,648] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2024-04-15 13:58:01,648] [WARNING] [config_utils.py:69:process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2024-04-15 13:58:01,649] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2024-04-15 13:58:01,855] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000}
2024-04-15 13:58:02.325 | INFO | xtts_api_server.tts_funcs:load_model:190 - Pre-create latents for all current speakers
2024-04-15 13:58:02.326 | INFO | xtts_api_server.tts_funcs:create_latents_for_all:270 - Latents created for all 0 speakers.
2024-04-15 13:58:02.326 | INFO | xtts_api_server.tts_funcs:load_model:193 - Model successfully loaded
C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\pydantic_internal_fields.py:160: UserWarning: Field "model_name" has conflict with protected namespace "model
".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
INFO: Started server process [2164]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8020 (Press CTRL+C to quit)
voice Anna(speakers/Anna.wav) is not found, switching to 'default'
1713178894.8860521 in server request
2024-04-15 14:01:34.886 | INFO | xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='Что ты говоришь' speaker_wav='default' language='ru' reply_part=0
INFO: ::1:58595 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call
await super().call(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call
await self.app(scope, receive, _send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call
await self.app(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app
await route.handle(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle
await self.app(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio
output_file_path = XTTS.process_tts_to_file(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file
raise e # Propagate exceptions for endpoint handling.
^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 548, in process_tts_to_file
speaker_wav = self.get_speaker_wav(speaker_name_or_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 540, in get_speaker_wav
raise ValueError(f"Speaker {speaker_name_or_path} not found.")
ValueError: Speaker default not found.
voice Anna(speakers/Anna.wav) is not found, switching to 'default'
1713178895.9273908 in server request

It can't find dir with speakers wavs.
I think you are running it from cmd. Instead simply double click the xtts_wav2lip.bat or open a cmd from the directory where the bat is. XTTS finds \speakers\ dir based on current directory.

explorer_v5ef0G1zyb
explorer_hL2OEym3qW
Here it is, I runned commands by copying them from .bat-s to the Conda environment, but now I added Conda in PATH, launched all from .bat-s, xtts downloaded some files, found 4 speakers, but kept getting the same issues.
Here-s an output:
W:\GOVORILKA\xtts>call conda activate xtts
2024-04-15 16:18:02.713 | INFO | xtts_api_server.tts_funcs:create_directories:283 - Folder in the path W:\GOVORILKA\xtts\xtts_models has been created
2024-04-15 16:18:02.715 | INFO | xtts_api_server.modeldownloader:upgrade_tts_package:80 - TTS will be using 0.22.0 by Mozer
2024-04-15 16:18:02.716 | INFO | xtts_api_server.server::76 - Model: 'v2.0.2' starts to load,wait until it loads
[XTTS] Downloading config.json...
100%|████████████████████████████████████████████████████████████████████████████| 4.36k/4.36k [00:00<00:00, 4.36MiB/s]
[XTTS] Downloading model.pth...
100%|████████████████████████████████████████████████████████████████████████████| 1.86G/1.86G [00:56<00:00, 32.7MiB/s]
[XTTS] Downloading vocab.json...
100%|██████████████████████████████████████████████████████████████████████████████| 335k/335k [00:00<00:00, 1.01MiB/s]
[XTTS] Downloading speakers_xtts.pth...
100%|████████████████████████████████████████████████████████████████████████████| 7.75M/7.75M [00:00<00:00, 22.7MiB/s]
[2024-04-15 16:19:16,118] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-15 16:19:16,489] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-04-15 16:19:16,689] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+unknown, git-hash=unknown, git-branch=unknown
[2024-04-15 16:19:16,689] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2024-04-15 16:19:16,690] [WARNING] [config_utils.py:69:process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2024-04-15 16:19:16,690] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2024-04-15 16:19:16,915] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000}
2024-04-15 16:19:17.455 | INFO | xtts_api_server.tts_funcs:load_model:190 - Pre-create latents for all current speakers
2024-04-15 16:19:17.456 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Anna: speakers/Anna.wav
2024-04-15 16:19:19.815 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for default: speakers/default.wav
2024-04-15 16:19:19.852 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Google: speakers/Google.wav
2024-04-15 16:19:19.921 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Kurt Cobain: speakers/Kurt Cobain.wav
2024-04-15 16:19:19.987 | INFO | xtts_api_server.tts_funcs:create_latents_for_all:270 - Latents created for all 4 speakers.
2024-04-15 16:19:19.987 | INFO | xtts_api_server.tts_funcs:load_model:193 - Model successfully loaded
C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\pydantic_internal_fields.py:160: UserWarning: Field "model_name" has conflict with protected namespace "model
".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
INFO: Started server process [16180]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8020 (Press CTRL+C to quit)
1713187201.829053 in server request
2024-04-15 16:20:01.829 | INFO | xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='А ты что' speaker_wav='Anna' language='ru' reply_part=0

Free memory : 6.095509 (GigaBytes)
Total memory: 11.993530 (GigaBytes)
Requested memory: 0.335938 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0000000790000000

INFO: ::1:63734 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call
await super().call(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call
await self.app(scope, receive, _send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call
await self.app(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app
await route.handle(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle
await self.app(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio
output_file_path = XTTS.process_tts_to_file(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file
raise e # Propagate exceptions for endpoint handling.
^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 598, in process_tts_to_file
self.local_generation(clear_text,speaker_name_or_path,speaker_wav,language,output_file)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 495, in local_generation
out = self.model.inference(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\TTS\tts\models\xtts.py", line 699, in inference
torchaudio.save(output_file, torch.tensor(wav_tensor).unsqueeze(0), 24000, encoding="PCM_U")
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torchaudio_backend\utils.py", line 312, in save
return backend.save(
^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torchaudio_backend\soundfile.py", line 44, in save
soundfile_backend.save(
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torchaudio_backend\soundfile_backend.py", line 457, in save
soundfile.write(file=filepath, data=src, samplerate=sample_rate, subtype=subtype, format=format)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\soundfile.py", line 343, in write
with SoundFile(file, 'w', samplerate, channels,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\soundfile.py", line 658, in init
self._file = self._open(file, mode_int, closefd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'C:\\Windows\\System32\\SillyTavern-Extras\\tts_out\\out_1.wav': System error.
1713187202.9234974 in server request

I see C:\Windows\System32\SillyTavern-Extras\tts_out\out_1.wav

I don't think that you installed extras to /system32.
Please edit xtts_wav2lip.bat and change --output param to full path where tts_out dir in extras is. Don't forget trailing slashes.

I actually installed it there, now I moved it to the same folder with xtts and it all worked! Thanks a lot for your help!