the gradient of all parameters is None

Question

the gradient of all parameters is None

nankepan opened this issue 2 months ago · comments

Hi,
I print param.grad here and find that the gradient of all parameters is None. Is this caused by using colorsalai? How can I obtain the gradient of parameters? Thank you.

Jiatong (Julius) Han · Answer 1 · Wed Apr 17 2024 00:15:58 GMT+0800 (China Standard Time)

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

nankepan · Answer 2 · Wed Apr 17 2024 10:04:11 GMT+0800 (China Standard Time)

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

Thanks for reply. loss is normal, but gradient is None before optimizer.zero_grad(), which is strange.
I trained the model, loss was steadily decreasing and the model performance was also improving. But the gradient None makes me confused.

Zheng Zangwei (Alex Zheng) · Answer 3 · Fri May 10 2024 14:19:51 GMT+0800 (China Standard Time)

This is because Colossalai manages the gradients, so you cannot directly access them by param.grad. @ver217 Could you please help with this?