declare-lab / tango

A family of diffusion models for text-to-audio generation.

Home Page:https://tango2-web.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nan loss in training

tranquangchung opened this issue · comments

Hi
Thanks for sharing your project.
When I trained your model based on your config, however, the val and train loss was NAN.
I tried many times but the results are still the same.
Can you tell me the reasons and how to solve it?

The problem made NAN is the Language model. So, I solved this problem by modifying your code, and it worked very well.

Hi @tranquangchung , How did you solve the nan problem? Thank You!

Hi, could you please explain how do you solve this problem? Thx!

It turns out the problem is with google/flan-t5-large, this model does not support fp16 training, use fp32 it would be fine.

Glad to know that it was solved. FYI we have released Tango 2: https://arxiv.org/abs/2404.09956