gradient problem

Question

gradient problem

denizyuret opened this issue 6 years ago · comments

denizyuret commented 6 years ago

@ilkerkesen

I implemented a new utility function gcheck (you need to checkout master AutoGrad and explicitly write using AutoGrad: gcheck to use it). This works with regular models and loss functions, e.g. gcheck(nll, model, x, y) where any of the inputs have param components should work.
I checked this on new versions of models like lenet (from the tutorial) and it works.
As expected it failed on your draw: include("debug.jl"); gcheck(loss,model,x).
Then I fixed the unboxing problem replacing Knet/src/rnn.jl:130 value.(hidden) with hidden.
I tried gcheck again but it still fails.

I was expecting (5) to pass. Did you try (4) and confirm the gradients work?

İlker Kesen · Answer 1 · Tue Dec 11 2018 07:59:03 GMT+0800 (China Standard Time)

@denizyuret I've performed the following on CPU,

gcheck with default setting defined in debug.jl: failed
gcheck with only one timestep different than default setting: failed
Check modular RNN: converted my own ilkarman benchmark example (Knet_RNN.ipynb) to Julia code: gcheck passed

So, currently no problem with modular RNNs. However, I use re-parametrization trick (used in VAEs), maybe this is the reason why. Right now, I'm going to try Carlo's VAE in order to see what happens.

İlker Kesen · Answer 2 · Tue Dec 11 2018 08:50:28 GMT+0800 (China Standard Time)

VAE also passes. I think I need to re-digest the model. Several issues might causing harm but yes I've tried (4) and gradients are same. What did I do?

Transfer PyTorch model weights to my implementation.
I use just one array for randomly sampled noise to see whether I'm getting the same gradients. Use that in all the timesteps.
Check gradients by eye and calculating norms.

İlker Kesen · Answer 3 · Tue Dec 11 2018 22:02:14 GMT+0800 (China Standard Time)

@denizyuret here's what I've done to make gradcheck pass on this network: Use just a single noise for all the time. Then, it passes. However, I don't know why it passes on Carlo's VAE implementation. Anyway, here's what I'm going to do:

Train a PyTorch network, transfer its weights to my implementation, then try to generate something meaningful.
Build a mechanism for noise sampling (maybe it causes a problem to network) which is handled outside of the loss function.

denizyuret · Answer 4 · Tue Dec 11 2018 22:09:31 GMT+0800 (China Standard Time)

This could be a problem similar to dropout. Take a look at how I test dropout in Knet/test/dropout.jl. gradcheck has to call the loss function many times, you need to make sure each call uses the same random numbers. Could you fix your debug.jl to use consistent random numbers so I can keep testing Knet with your model? Also could you confirm that gcheck passes with rnnforw or the version of RNN without value.(), but fails with the version of RNN with value.()?

…

On Tue, Dec 11, 2018 at 9:02 AM İlker Kesen ***@***.***> wrote: @denizyuret <https://github.com/denizyuret> here's what I've done to make gradcheck pass on this network: Use just a single noise for all the time. Then, it passes. However, I don't know why it passes on Carlo's VAE implementation. Anyway, here's what I'm going to do: 1. Train a PyTorch network, transfer its weights to my implementation, then try to generate something meaningful. 2. Build a mechanism for noise sampling (maybe it causes a problem to network) which is handled outside of the loss function. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABvNpp0qkZJx17DNCGyFMoZ3iu5UsEj3ks5u37rngaJpZM4ZL_FE> .

İlker Kesen · Answer 5 · Tue Dec 11 2018 22:34:40 GMT+0800 (China Standard Time)

I confirm that gcheck passes when we remove that value.() call and fails otherwise. You can use the latest master in order to test gcheck on my model.