使用主页的推理代码，我已经跑了一小时还没结束···，慢得相当离谱啊

Question

使用主页的推理代码，我已经跑了一小时还没结束···，慢得相当离谱啊

FONCHIEH opened this issue a month ago · comments

FONCHIEH commented a month ago

3080 32g内存

SuBazinga · Answer 1 · Thu Jun 20 2024 10:46:55 GMT+0800 (China Standard Time)

你的音频长度是多少？

FONCHIEH · Answer 2 · Thu Jun 20 2024 14:07:55 GMT+0800 (China Standard Time)

一个视频这么久还没生成吗

是的，大概用了两个小时时间，看了下控制台，onnx好像没有使用cuda，用得是cpu

FONCHIEH · Answer 3 · Thu Jun 20 2024 14:08:16 GMT+0800 (China Standard Time)

你的音频长度是多少？

python scripts/inference.py --source_image examples/reference_images/1.jpg --driving_audio examples/driving_audios/1.wav
用得官方实例，这个音频好像是7秒。

FONCHIEH · Answer 4 · Thu Jun 20 2024 14:15:03 GMT+0800 (China Standard Time)

你的音频长度是多少？

耗时主要是loaded weight from /xxx/hallo/net.pth ，第一步处理大概用了1小时

AricGamma · Answer 5 · Thu Jun 20 2024 14:26:25 GMT+0800 (China Standard Time)

能发一下您当时的GPU load吗？
我在3080Ti上测试，大概10秒跑一帧，7秒的视频半个小时左右

AricGamma · Answer 6 · Thu Jun 20 2024 14:30:42 GMT+0800 (China Standard Time)

一个视频这么久还没生成吗

是的，大概用了两个小时时间，看了下控制台，onnx好像没有使用cuda，用得是cpu

onnx的模型只是用来提取embeddings，耗时占比不大。我们会看下这个问题

FONCHIEH · Answer 7 · Thu Jun 20 2024 15:45:07 GMT+0800 (China Standard Time)

能发一下您当时的GPU load吗？我在3080Ti上测试，大概10秒跑一帧，7秒的视频半个小时左右

下面是中午又重新跑的，还没完成。至少已经过去1个小时了

`python scripts/inference.py --source_image examples/reference_images/1.jpg --driving_audio examples/driving_audios/1.wav
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.2.2+cu121 with CUDA 1201 (you have 2.2.2+cu118)
Python 3.10.11 (you have 3.10.14)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
File "C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\xformers_init_.py", line 55, in _is_triton_available
from xformers.triton.softmax import softmax as triton_softmax # noqa
File "C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\xformers\triton\softmax.py", line 11, in
import triton
ModuleNotFoundError: No module named 'triton'
INFO:albumentations.check_version:A new version of Albumentations is available: 1.4.10 (you have 1.4.9). Upgrade using: pip install --upgrade albumentations
WARNING:py.warnings:C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
warnings.warn(

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\glintr100.onnx recognition ['None', 3, 112, 112] 127.5 127.5
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\scrfd_10g_bnkps.onnx detection [1, 3, '?', '?'] 127.5 128.0
set det-size: (640, 640)
WARNING:py.warnings:C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1718861662.292773 73080 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1718861662.299956 69484 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1718861662.307401 75796 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
WARNING:py.warnings:C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.
warnings.warn('SymbolDatabase.GetPrototype() is deprecated. Please '

Processed and saved: ./.cache\1_sep_background.png
Processed and saved: ./.cache\1_sep_face.png
Some weights of Wav2VecModel were not initialized from the model checkpoint at ./pretrained_models/wav2vec/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:audio_separator.separator.separator:Separator version 0.17.2 instantiating with output_dir: ./.cache\audio_preprocess, output_format: WAV
INFO:audio_separator.separator.separator:Operating System: Windows 10.0.22631
INFO:audio_separator.separator.separator:System: Windows Node: DESKTOP-3RJLQ2D Release: 10 Machine: AMD64 Proc: Intel64 Family 6 Model 165 Stepping 5, GenuineIntel
INFO:audio_separator.separator.separator:Python Version: 3.10.14
INFO:audio_separator.separator.separator:PyTorch Version: 2.2.2+cu118
INFO:audio_separator.separator.separator:FFmpeg installed: ffmpeg version 2024-02-26-git-a3ca4beeaa-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
INFO:audio_separator.separator.separator:ONNX Runtime CPU package installed with version: 1.18.0
INFO:audio_separator.separator.separator:CUDA is available in Torch, setting Torch device to CUDA
WARNING:audio_separator.separator.separator:CUDAExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled
INFO:audio_separator.separator.separator:Loading model Kim_Vocal_2.onnx...
INFO:audio_separator.separator.separator:Load model duration: 00:00:01
INFO:audio_separator.separator.separator:Starting separation process for audio_file_path: examples/driving_audios/1.wav
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:08<00:00, 2.70s/it]
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 7.85it/s]
INFO:audio_separator.separator.separator:Saving Vocals stem to 1_(Vocals)_Kim_Vocal_2.wav...
INFO:audio_separator.separator.separator:Clearing input audio file paths, sources and stems...
INFO:audio_separator.separator.separator:Separation duration: 00:00:09
The config attributes {'center_input_sample': False, 'out_channels': 4} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
INFO:hallo.models.unet_3d:loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ...
The config attributes {'center_input_sample': False} were passed to UNet3DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Load motion module params from pretrained_models\motion_module\mm_sd_v15_v2.ckpt
INFO:hallo.models.unet_3d:Loaded 453.20928M-parameter motion module
loaded weight from ./pretrained_models/hallo\net.pth
100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [22:34<00:00, 33.87s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 9.54it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [37:52<00:00, 56.82s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 9.97it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [28:10<00:00, 42.26s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 12.19it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [14:22<00:00, 21.55s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 11.94it/s]
90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 36/40 [17:08<03:12, 48.16s/`

FONCHIEH · Answer 8 · Thu Jun 20 2024 15:58:17 GMT+0800 (China Standard Time)

一个视频这么久还没生成吗

是的，大概用了两个小时时间，看了下控制台，onnx好像没有使用cuda，用得是cpu

onnx的模型只是用来提取embeddings，耗时占比不大。我们会看下这个问题

印象里是onxxgpu的版本不匹配导致的

AricGamma · Answer 9 · Thu Jun 20 2024 18:25:51 GMT+0800 (China Standard Time)

一个视频这么久还没生成吗

是的，大概用了两个小时时间，看了下控制台，onnx好像没有使用cuda，用得是cpu

onnx的模型只是用来提取embeddings，耗时占比不大。我们会看下这个问题

印象里是onxxgpu的版本不匹配导致的

嗯 ONNX的那个warning，安装一下onnxruntime-gpu，就可以修复了

AricGamma · Answer 10 · Thu Jun 20 2024 18:27:42 GMT+0800 (China Standard Time)

能发一下您当时的GPU load吗？我在3080Ti上测试，大概10秒跑一帧，7秒的视频半个小时左右

下面是中午又重新跑的，还没完成。至少已经过去1个小时了

`python scripts/inference.py --source_image examples/reference_images/1.jpg --driving_audio examples/driving_audios/1.wav WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.2.2+cu121 with CUDA 1201 (you have 2.2.2+cu118) Python 3.10.11 (you have 3.10.14) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details A matching Triton is not available, some optimizations will not be enabled Traceback (most recent call last): File "C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\xformers__init__.py", line 55, in _is_triton_available from xformers.triton.softmax import softmax as triton_softmax # noqa File "C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\xformers\triton\softmax.py", line 11, in import triton ModuleNotFoundError: No module named 'triton' INFO:albumentations.check_version:A new version of Albumentations is available: 1.4.10 (you have 1.4.9). Upgrade using: pip install --upgrade albumentations WARNING:py.warnings:C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider' warnings.warn(

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: ./pretrained_models/face_analysis\models\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: ./pretrained_models/face_analysis\models\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: ./pretrained_models/face_analysis\models\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: ./pretrained_models/face_analysis\models\glintr100.onnx recognition ['None', 3, 112, 112] 127.5 127.5 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: ./pretrained_models/face_analysis\models\scrfd_10g_bnkps.onnx detection [1, 3, '?', '?'] 127.5 128.0 set det-size: (640, 640) WARNING:py.warnings:C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions. To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1. P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR W0000 00:00:1718861662.292773 73080 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. W0000 00:00:1718861662.299956 69484 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors. W0000 00:00:1718861662.307401 75796 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors. WARNING:py.warnings:C:\Users\FONCHIEH.conda\envs\hallo\lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon. warnings.warn('SymbolDatabase.GetPrototype() is deprecated. Please '

Processed and saved: ./.cache\1_sep_background.png Processed and saved: ./.cache\1_sep_face.png Some weights of Wav2VecModel were not initialized from the model checkpoint at ./pretrained_models/wav2vec/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. INFO:audio_separator.separator.separator:Separator version 0.17.2 instantiating with output_dir: ./.cache\audio_preprocess, output_format: WAV INFO:audio_separator.separator.separator:Operating System: Windows 10.0.22631 INFO:audio_separator.separator.separator:System: Windows Node: DESKTOP-3RJLQ2D Release: 10 Machine: AMD64 Proc: Intel64 Family 6 Model 165 Stepping 5, GenuineIntel INFO:audio_separator.separator.separator:Python Version: 3.10.14 INFO:audio_separator.separator.separator:PyTorch Version: 2.2.2+cu118 INFO:audio_separator.separator.separator:FFmpeg installed: ffmpeg version 2024-02-26-git-a3ca4beeaa-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers INFO:audio_separator.separator.separator:ONNX Runtime CPU package installed with version: 1.18.0 INFO:audio_separator.separator.separator:CUDA is available in Torch, setting Torch device to CUDA WARNING:audio_separator.separator.separator:CUDAExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled INFO:audio_separator.separator.separator:Loading model Kim_Vocal_2.onnx... INFO:audio_separator.separator.separator:Load model duration: 00:00:01 INFO:audio_separator.separator.separator:Starting separation process for audio_file_path: examples/driving_audios/1.wav 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:08<00:00, 2.70s/it] 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 7.85it/s] INFO:audio_separator.separator.separator:Saving Vocals stem to 1_(Vocals)_Kim_Vocal_2.wav... INFO:audio_separator.separator.separator:Clearing input audio file paths, sources and stems... INFO:audio_separator.separator.separator:Separation duration: 00:00:09 The config attributes {'center_input_sample': False, 'out_channels': 4} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file. Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight'] INFO:hallo.models.unet_3d:loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ... The config attributes {'center_input_sample': False} were passed to UNet3DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file. Load motion module params from pretrained_models\motion_module\mm_sd_v15_v2.ckpt INFO:hallo.models.unet_3d:Loaded 453.20928M-parameter motion module loaded weight from ./pretrained_models/hallo\net.pth 100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [22:34<00:00, 33.87s/it] 100%|██████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 9.54it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [37:52<00:00, 56.82s/it] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 9.97it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [28:10<00:00, 42.26s/it] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 12.19it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [14:22<00:00, 21.55s/it] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 11.94it/s] 90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 36/40 [17:08<03:12, 48.16s/`

看日志，推理速度还可以的。你的音频时长多少？音频时长控制在5-15秒之间效果最好

FONCHIEH · Answer 11 · Thu Jun 20 2024 18:52:35 GMT+0800 (China Standard Time)

python scripts/inference.py --source_image examples/reference_images/1.jpg --driving_audio examples/driving_audios/1.wav

examples/reference_images/1.jpg
examples/driving_audios/1.wav

使用的是库里的演示文件。这个音频7秒，但目前依然没跑完，已经5小时了，看日志应该还有半小时可以完成。

但是
examples/reference_images/7.jpg
examples/driving_audios/2.wav
这个案例大概总计用了1小时就完成了，匪夷所思····

AricGamma · Answer 12 · Thu Jun 20 2024 19:20:18 GMT+0800 (China Standard Time)

这个 case 不正常，我们看一下

FONCHIEH · Answer 13 · Thu Jun 20 2024 20:14:10 GMT+0800 (China Standard Time)

这个 case 不正常，我们看一下

辛苦了

yincangshiwei · Answer 14 · Fri Jun 21 2024 15:15:02 GMT+0800 (China Standard Time)

我跑1分钟音频，60FPS，只要3个多小时。。。

nitinmukesh · Answer 15 · Fri Jun 21 2024 16:34:09 GMT+0800 (China Standard Time)

Not sure why but it need atleast 12 hours for the sample audio (7s). I closed after waiting for 7 hours.

This seriously needs heavy optimization. I tried several other tools like SadTalker, Aniportrait, V-express, etc... but no other tool takes this much time.

I hope developers takes a look at this and optimize it. Thanks for their hard work.

AricGamma · Answer 16 · Sun Jun 23 2024 09:48:26 GMT+0800 (China Standard Time)

We are currently optimizing our inference performance, which has seen significant improvements, and we plan to release it soon.

nitinmukesh · Answer 17 · Mon Jun 24 2024 03:10:25 GMT+0800 (China Standard Time)

We are currently optimizing our inference performance, which has seen significant improvements, and we plan to release it soon.

Thank you for looking into improving the performance. The output created by other users looks promising and we are looking forward to use this.

Appreciate your hard work.

SuBazinga · Answer 18 · Mon Jun 24 2024 19:14:42 GMT+0800 (China Standard Time)

I am closing this issue now. It can be reopened at any time if needed.

nitinmukesh · Answer 19 · Mon Jun 24 2024 22:50:27 GMT+0800 (China Standard Time)

@subazinga Have you merged the optimization changes? Can I test plz.

Arthur Wu · Answer 20 · Wed Jul 03 2024 14:41:03 GMT+0800 (China Standard Time)

@subazinga Have you merged the optimization changes? Can I test plz.
please release it soon , it's too slow now