What do the loss curves look like during your successful training?

Question

What do the loss curves look like during your successful training?

YuXiangLin1234 opened this issue 2 months ago · comments

Hello,

I've attempted to train FAcodec using my own dataset. However, whether I start from scratch or fine-tune your provided checkpoint, the reconstructed audio clips are just noise. I fine-tuned the model using around 128 hours of Common Voice 18 ZH-TW data. After approximately 20k steps, the loss seemed to converge. Some losses, like feature loss, decreased successfully, while others, such as mel loss and waveform loss, were oscillating.

Do all losses decrease during your training process?

Songting · Answer 1 · Mon Jul 29 2024 23:21:54 GMT+0800 (China Standard Time)

Could you please share your voice examples and loss curves? I believe they can help for analyzing the issue you encountered

YuXiangLin · Answer 2 · Tue Jul 30 2024 02:44:12 GMT+0800 (China Standard Time)

The loss curve looks like:

The audio samples are as follows:
https://huggingface.co/datasets/mozilla-foundation/common_voice_16_0/viewer/zh-TW

The reconstructed audio sample:
https://drive.google.com/file/d/1yk_xZL17FkhIYMjojesd-PHWyAKuqzSA/view?usp=sharing

Songting · Answer 3 · Tue Jul 30 2024 16:35:31 GMT+0800 (China Standard Time)

According to the mel_loss in the loss curve you shared, the model seems to have converged well.
However, the reconstructed audio samples sounds to be generated by a randomly initialized model.
May I know whether the reconstructed sample is retrieved from tensorboard or through another reconstruction script?