Some thoughts + questions

Question

Some thoughts + questions

TylerYep opened this issue 4 years ago · comments

Hey, thank you so much for writing this implementation up! It is a feature I've wanted to see in pytorch-lightning for a long time but never got the chance to get to it.

In the BatchGradient verifier, we pop the index containing the batch that we are testing for. However, I think it would be preferable to also verify that the gradient of that popped batch is in fact non-zero, since a gradient of all zeros would pass our test but would not train the network at all. My example code is here: verify.py

Finally, in my own projects, I wrote a few other verification functions, but I believe that they are already handled by lightning, could you verify this? I am referring to:

Issuing a warning if train() is turned on but all layers are frozen
Any NaN or INF value is present in gradients or any weights

If I have some spare time I will try testing this code myself, but overall looks really great! 👍

awaelchli · Answer 1 · Mon Sep 07 2020 14:35:47 GMT+0800 (China Standard Time)

@TylerYep Thank you very much for the feedback. Saw this message only just now.. sorry!

However, I think it would be preferable to also verify that the gradient of that popped batch is in fact non-zero, since a gradient of all zeros would pass our test but would not train the network at all.

Very good observation, I will include that!
EDIT: done here: 9527fba

Issuing a warning if train() is turned on but all layers are frozen

I am not aware of such a feature in Lightning :)

Any NaN or INF value is present in gradients or any weights

Yes, but one needs to turn it on with a Trainer flag. Searching for these values every iteration can impact performance, so it is not on by default.

awaelchli · Answer 2 · Mon Sep 07 2020 14:37:47 GMT+0800 (China Standard Time)

I also see in your verify.py, you have a function that runs all tests at once, that seems very convenient. 👍