Issue with zero_grad?

Question

Issue with zero_grad?

sky87 opened this issue 2 years ago · comments

Hi, unless I'm misunderstanding something, zero_grad in nn.py is zeroing out the gradients on the parameter nodes, but shouldn't it do it on all the nodes in the graph?
Otherwise the inner nodes will keep accumulating them.

Francesco Cielo · Answer 1 · Mon Nov 28 2022 08:05:57 GMT+0800 (China Standard Time)

My bad, I didn't read the whole file carefully enough :)

Ben Zhang · Answer 2 · Sun Feb 12 2023 12:17:25 GMT+0800 (China Standard Time)

@sky87 what did you realize? I have the same question but haven't figured it out.

Ben Zhang · Answer 3 · Sun Feb 12 2023 12:19:48 GMT+0800 (China Standard Time)

From the looks of this: #8, this is still a known issue?

Francesco Cielo · Answer 4 · Sun Feb 12 2023 18:29:34 GMT+0800 (China Standard Time)

@ben-z It's been a few months, if I remember correctly the inner nodes are new Value instances that get recreated every time (see for example all the products and the sum in Neuron#__call__), so you don't need to zero them out. The parameters are the only thing that survives between runs

Ben Zhang · Answer 5 · Sun Feb 12 2023 21:57:49 GMT+0800 (China Standard Time)

@ben-z It's been a few months, if I remember correctly the inner nodes are new Value instances that get recreated every time (see for example all the products and the sum in Neuron#__call__), so you don't need to zero them out. The parameters are the only thing that survives between runs

That makes a lot of sense!! Thanks for the explanation.