what's the role param init play?

Question

what's the role param init play?

MachineJeff opened this issue 5 years ago · comments

Hi
I wanna know what the param init do in your code

and why you create 3 models in the same train script
I was confused

Zoltan Nagy · Answer 1 · Tue Nov 26 2019 03:24:57 GMT+0800 (China Standard Time)

Hello
In the templates the weights and all the parameters are defined with "tf.get_variable(...)". In the model template these weights are defined. So the initialization, training, and testing phases share the same weights. Before training phase the parameters are initialized with the initialization call on the model template, so the parameters has been initialized with a feed forward step. After the parameters are being initialized, the training phase is called with the initialized parameters.
Templates can handle the situtation when calling it multiple times. If the parameters were initialized before calling the model template once again, the second time the template will reuse the initialized variables.

Jeffrey_Lee · Answer 2 · Tue Nov 26 2019 10:13:14 GMT+0800 (China Standard Time)

Right, I get your point.

But, why not remove the param init and just create one model, like this:

init_forward = model(x_init,keep_prob=0.5,deterministic=is_training,
                        use_weight_normalization=use_weight_normalization,
                        use_batch_normalization=use_batch_normalization, 
                        use_mean_only_batch_normalization=use_mean_only_batch_normalization)

Use variable is_training to distinguish train and test, then just apply train, test in the same init_forward model?

Jeffrey_Lee · Answer 3 · Tue Nov 26 2019 10:17:42 GMT+0800 (China Standard Time)

Besides, really appreciate your weight_norm code.
I do not like the param init
So I rewrite the weight_norm code into this:

def wn_conv1d(x, kernel_size, channels, scope, stride=1, pad='SAME', dilation=1, nonlinearity=None, init_scale=1.):

    xs = int_shape(x)
    filter_size = [1, kernel_size]
    dila = [1, dilation]
    strs = [1, stride]
    with tf.variable_scope(scope):
        # data based initialization of parameters
        V = tf.get_variable('V', filter_size+[xs[-1],channels], tf.float32, tf.random_normal_initializer(0, 0.05), trainable=True)
        V_norm = tf.nn.l2_normalize(V.initialized_value(), [0,1,2])
        x_init = tf.nn.conv2d(x, V_norm, [1]+strs+[1], pad, dilations=dila)
        m_init, v_init = tf.nn.moments(x_init, [0,1,2])
        scale_init = init_scale/tf.sqrt(v_init + 1e-8)
        x_init = tf.reshape(scale_init,[1,1,1,channels])*(x_init-tf.reshape(m_init,[1,1,1,channels]))
        if nonlinearity is not None:
            x_init = nonlinearity(x_init)
        return x_init

That's not conv2d, but conv1d （It does not matter）

Any problem in my code do you think?