zoli333 / Weight-Normalization

Complete implementation of Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks article

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Initialization of g and b is not done properly

SDNAFIO opened this issue · comments

Looking at the code for the initialization for the g and b parameter,
it seems like the actual assignment operation is never called.

The assignment operations are created, but never executed:

Weight-Normalization/nn.py

Lines 143 to 144 in f493976

g = g.assign(scale_init)
b = b.assign(-m_init * scale_init)

(see also: https://stackoverflow.com/a/34220750/9562563 if the behaviour of tf.assign is unclear)

This seems to be the case for both the dense and conv2d layer.

Thank you!
I think it is running correctly, like in the following example:

  import tensorflow as tf
  x = tf.Variable(0)
  x = x.assign(1)
  z = 10 + x
  with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        print sess.run(z)

result is 11, which is correct. The initialization part in init_forward step in train.py is called inside of the session, so I think this will also be runned.

  if epoch==0:
        sess.run(init_forward,feed_dict={x_init: trainx_white[:init_batch_size]})

I assume this will be forcing the graph to add the assign nodes, and running them (because I run this operation in the session by calling init_forward template in train.py, see the templates separately for initialization and for training in train.py).

The tf.assign operation definition I think is not necessary, the same behavior happens when adding tf.control_dependencies() which gives the same result. However it would be much more elegant to define operations for the assigns with tf.assign function, return these opearations also in conv2d and dense layers, and calling them directly in the session.

In your example z has a dependency on x, so of course the assignment operation will be run. His point is that nothing has a dependency on the assign ops in the initialization code, so of course they will never be executed.

so thats why the tf.assign should be used instead, and use tf.control_dependencies instead?

You can have a look at the implementation of weight normalization here: https://github.com/CompVis/vunet/blob/master/nn.py#L36

Since x has a dependency on g and b assign ops during initialization, they are forced to be run.

I got it thank you for the explanation, and for the link also. They use tf.assign instead.
Very nice repository. I will correct this...

Another question how can I debug these operations?

I usually just wrap the op in a tf.Print. So you can try:

g = tf.assign(...)
g = tf.Print(g, [g], message='test ')

If nothing gets printed then your op is not being executed.