zoli333 / Weight-Normalization

Complete implementation of Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks article

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

what's the role param init play?

MachineJeff opened this issue · comments

Hi
I wanna know what the param init do in your code
捕获

and why you create 3 models in the same train script
I was confused

Hello
In the templates the weights and all the parameters are defined with "tf.get_variable(...)". In the model template these weights are defined. So the initialization, training, and testing phases share the same weights. Before training phase the parameters are initialized with the initialization call on the model template, so the parameters has been initialized with a feed forward step. After the parameters are being initialized, the training phase is called with the initialized parameters.
Templates can handle the situtation when calling it multiple times. If the parameters were initialized before calling the model template once again, the second time the template will reuse the initialized variables.

Right, I get your point.

But, why not remove the param init and just create one model, like this:

init_forward = model(x_init,keep_prob=0.5,deterministic=is_training,
                        use_weight_normalization=use_weight_normalization,
                        use_batch_normalization=use_batch_normalization, 
                        use_mean_only_batch_normalization=use_mean_only_batch_normalization)

Use variable is_training to distinguish train and test, then just apply train, test in the same init_forward model?

Besides, really appreciate your weight_norm code.
I do not like the param init
So I rewrite the weight_norm code into this:

def wn_conv1d(x, kernel_size, channels, scope, stride=1, pad='SAME', dilation=1, nonlinearity=None, init_scale=1.):

    xs = int_shape(x)
    filter_size = [1, kernel_size]
    dila = [1, dilation]
    strs = [1, stride]
    with tf.variable_scope(scope):
        # data based initialization of parameters
        V = tf.get_variable('V', filter_size+[xs[-1],channels], tf.float32, tf.random_normal_initializer(0, 0.05), trainable=True)
        V_norm = tf.nn.l2_normalize(V.initialized_value(), [0,1,2])
        x_init = tf.nn.conv2d(x, V_norm, [1]+strs+[1], pad, dilations=dila)
        m_init, v_init = tf.nn.moments(x_init, [0,1,2])
        scale_init = init_scale/tf.sqrt(v_init + 1e-8)
        x_init = tf.reshape(scale_init,[1,1,1,channels])*(x_init-tf.reshape(m_init,[1,1,1,channels]))
        if nonlinearity is not None:
            x_init = nonlinearity(x_init)
        return x_init

That's not conv2d, but conv1d (It does not matter)

Any problem in my code do you think?