The code is not consistent with paper
auroua opened this issue · comments
Hi Tim:
In your paper, you said:
We propose to reparameterize each weight vector w in terms of a parameter vector v and a scalar parameter g and to perform stochastic gradient descent with respect to those parameters instead.
but in example code:
incoming.W = incoming.W_param * (self.g/T.sqrt(T.sum(T.square(incoming.W_param),axis=W_axes_to_sum))).dimshuffle(*W_dimshuffle_args)
In this code incoming.W_param is vector v, but is not a parameter. You still training the weight vector w.
Which one is correct?
Thanks