Why convolute on the input_batch twice with different activation function, rather than convolute once and activate with different function

Question

Why convolute on the input_batch twice with different activation function, rather than convolute once and activate with different function

OneDirection9 opened this issue 6 years ago · comments

Why the implement convolute on the input_batch twice with tanh and sigmod activation function, as follows:

conv_filter = causal_conv(input_batch, weights_filter, dilation)
conv_gate = causal_conv(input_batch, weights_gate, dilation)

I want to know whether we can convolution once time, and activate it with different function. I think this can reduce some computation. Pseudo-code in keras as follows:

conv = causal_conv(input_batch, filter_size)
tanh_out = Activation('tanh')(conv)
sigmod_out = Activation('sigmod')(conv)

saddlekiller · Answer 1 · Wed Aug 29 2018 17:39:13 GMT+0800 (China Standard Time)

No, your pseudo-code is different from what described in paper. But you can do the convolution once and then split them into two components, one for "tanh" and the other for "sigmoid".

Zhipeng Han · Answer 2 · Wed Aug 29 2018 17:54:32 GMT+0800 (China Standard Time)

Thanks for your reply.
I think the Eq.2 z=tanh(Wf,k ∗x)⊙σ(Wg,k ∗x) supports what your said.

But the Figure. 4 in the paper does not have the meaning as same as the equation, since it draw only one dilated convolutions and then split into two components.

I think the equation is more convincing.

Zhipeng Han · Answer 3 · Wed Aug 29 2018 17:54:54 GMT+0800 (China Standard Time)

Zhipeng Han commented 6 years ago

saddlekiller · Answer 4 · Wed Aug 29 2018 18:46:11 GMT+0800 (China Standard Time)

If you use the same weights in these two components, the output will always close to 1 when output get larger and close to 0 when it goes smaller. I don’t think that’s the “gate” works for.

Zhipeng Han · Answer 5 · Wed Aug 29 2018 18:51:02 GMT+0800 (China Standard Time)

Got it. Thanks.