hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Home Page:https://hpcaitech.github.io/Open-Sora/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the gradient of all parameters is None

nankepan opened this issue · comments

image
Hi,
I print param.grad here and find that the gradient of all parameters is None. Is this caused by using colorsalai? How can I obtain the gradient of parameters? Thank you.

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

Thanks for reply. loss is normal, but gradient is None before optimizer.zero_grad(), which is strange.
I trained the model, loss was steadily decreasing and the model performance was also improving. But the gradient None makes me confused.

This is because Colossalai manages the gradients, so you cannot directly access them by param.grad. @ver217 Could you please help with this?