the gradient of all parameters is None
nankepan opened this issue · comments
It should only be None
after optimizer.zero_grad()
; booster.backward
was doing torch.optim.Optimizer.backward(loss)
. Would you mind printing the contents of loss
to see if it is NaN
?
It should only be
None
afteroptimizer.zero_grad()
;booster.backward
was doingtorch.optim.Optimizer.backward(loss)
. Would you mind printing the contents ofloss
to see if it isNaN
?
Thanks for reply. loss
is normal, but gradient is None
before optimizer.zero_grad()
, which is strange.
I trained the model, loss was steadily decreasing and the model performance was also improving. But the gradient None
makes me confused.
This is because Colossalai manages the gradients, so you cannot directly access them by param.grad
. @ver217 Could you please help with this?