vdumoulin / conv_arithmetic

A technical report on convolution arithmetic in the context of deep learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is input and what is output?

apisarek opened this issue · comments

Hi! Great report :)

There is something confusing in gifs.
I can't distinguish (and probably nobody can) https://github.com/vdumoulin/conv_arithmetic/blob/master/gif/full_padding_no_strides_transposed.gif and https://github.com/vdumoulin/conv_arithmetic/blob/master/gif/no_padding_no_strides.gif (I'm not talking about width and height of feature map). You can't a priori tell what is up/down, input/output, before operation/after operation. I would suggest changing colors or (better) putting "input"/"output" labels.

commented

I second this. Unless the reader knew that the transposed convolution is another name of deconvolution operation (https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers) so its output is larger than its input (which is a convoluted version of original signal), the GIFs are harder to comprehend.

Thank you for bringing this up to us. Here is what we propose to solve the issue:

  1. Add "input" and "output" labels to the GIFs.
  2. In the guide itself, add a mention to the fact that transposed convolutions are sometimes called "deconvolutions" (although this is a loaded term that should be avoided in our opinion, as a deconvolution is formally defined as the inverse of a convolution, which the transposed convolution is not).