weight distribution for deconv layer

Question

weight distribution for deconv layer

r0drigor opened this issue 4 years ago · comments

This might be a question that is more about caffe than FlowNet but I'll ask here anyway.
I'm doing some work with FlowNet2-s (the first iteration of the project), and I'm using MATLAB to get the weight values for each layer and I can't understand why the deconvolution layers' weights use a different structure than regular convolution.
(I'm using net.params('conv1').get_data()).

For regular convolutions the weight distribution is (h,w,c,n), while for deconv layers it's (h,w,n,c). Why is this the case?

Nikolaus Mayer · Answer 1 · Fri Jan 24 2020 17:34:49 GMT+0800 (China Standard Time)

This is just a quick guess, but "deconvolution" is implemented as transposed convolution (both layers directly use the cublasSgemm matrix multiplication).

r0drigor · Answer 2 · Fri Jan 24 2020 19:34:18 GMT+0800 (China Standard Time)

I can't find any documentation on any of this, but maybe I'm just not looking hard enough.
There's probably some kind of transposition on the height and width parameters, right?
I'm having some difficulty getting reasonable values and it might be an explanation.

Thank you for the fast response.

(After your answer you can lock this issue)

Nikolaus Mayer · Answer 3 · Mon Jan 27 2020 18:40:11 GMT+0800 (China Standard Time)

I think it's common to implement these "deconvolutions" like that; see e.g. http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#transposed-convolution-arithmetic

You can check the Forward_cpu implementations in the conv and deconv layers, they should match the method in the link.

I can't tell you why exactly the parameters are structured differently; I can only assume that the transposed convolution plays a role.

Nikolaus Mayer · Answer 4 · Tue May 12 2020 20:09:34 GMT+0800 (China Standard Time)

(closed due to inactivity)