The code is not consistent with paper

Question

The code is not consistent with paper

auroua opened this issue 7 years ago · comments

Hi Tim:
In your paper, you said:

We propose to reparameterize each weight vector w in terms of a parameter vector v and a scalar parameter g and to perform stochastic gradient descent with respect to those parameters instead.

but in example code:
incoming.W = incoming.W_param * (self.g/T.sqrt(T.sum(T.square(incoming.W_param),axis=W_axes_to_sum))).dimshuffle(*W_dimshuffle_args)
In this code incoming.W_param is vector v, but is not a parameter. You still training the weight vector w.
Which one is correct?
Thanks