音频文件识别失败

Question

音频文件识别失败

tailangjun opened this issue a year ago · comments

2023-09-09 00:44:33 【已完成】音频文件命名修改
2023-09-09 00:44:40 音频文件识别失败
2023-09-09 00:44:40 【已完成】音频对应txt文本生成
2023-09-09 00:44:40 【已完成】数据预处理，训练集、验证集切分
2023-09-09 00:44:40 【进行中】开始训练，训练进度请看后台

请问，这个识别失败是对音频有什么要求吗

太郎君 · Answer 1 · Sat Sep 09 2023 12:27:19 GMT+0800 (China Standard Time)

查了一下日志，提示 RuntimeError: Failed to load audio from separated/htdemucs/my-voice_1/vocals.wav
就是这一行有问题
sr = torchaudio.load(os.path.join("separated", "htdemucs", fname, "vocals.wav")
请问这个文件是干啥的，哪里可以下载到呢

太郎君 · Answer 2 · Sat Sep 09 2023 12:46:45 GMT+0800 (China Standard Time)

问题解决了，应该是这行语句就失败了 os.system(f"demucs --two-stems=vocals {file}")
这个 demucs是 facebook的库，安装命令为 pip install -U 'https://github.com/facebookresearch/demucs'

太郎君 · Answer 3 · Sat Sep 09 2023 12:51:00 GMT+0800 (China Standard Time)

终于在 Ubuntu上跑起来了
INFO:OUTPUT_MODEL:====> Epoch: 81
0%| | 0/4 [00:00<?, ?it/s]/opt/anaconda3/envs/vits/lib/python3.9/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/opt/anaconda3/envs/vits/lib/python3.9/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/G_324.pth
INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/D_324.pth
oldest_checkpoint_path:OUTPUT_MODEL/G_320.pth
oldest_checkpoint_path:OUTPUT_MODEL/D_320.pth
remove OUTPUT_MODEL/G_320.pth
remove OUTPUT_MODEL/D_320.pth
25%|█████████████████████████████████▌ | 1/4 [00:02<00:07, 2.59s/it]INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/G_325.pth
INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/D_325.pth
oldest_checkpoint_path:OUTPUT_MODEL/G_321.pth
oldest_checkpoint_path:OUTPUT_MODEL/D_321.pth
remove OUTPUT_MODEL/G_321.pth
remove OUTPUT_MODEL/D_321.pth
50%|███████████████████████████████████████████████████████████████████ | 2/4 [00:03<00:02, 1.32s/it]INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/G_326.pth
INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/D_326.pth
oldest_checkpoint_path:OUTPUT_MODEL/G_322.pth
oldest_checkpoint_path:OUTPUT_MODEL/D_322.pth
remove OUTPUT_MODEL/G_322.pth
remove OUTPUT_MODEL/D_322.pth
75%|████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 3/4 [00:03<00:00, 1.11it/s]INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/G_327.pth
INFO:OUTPUT_MODEL:Saving model and optimizer state at iteration 82 to OUTPUT_MODEL/D_327.pth
oldest_checkpoint_path:OUTPUT_MODEL/G_323.pth
oldest_checkpoint_path:OUTPUT_MODEL/D_323.pth
remove OUTPUT_MODEL/G_323.pth
remove OUTPUT_MODEL/D_323.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.08s/it]