Slight difference between Deconvnet and Guided BP

Question

Slight difference between Deconvnet and Guided BP

123dddd opened this issue 4 years ago · comments

Thanks for the GREAT repo! I noticed, there is only one small difference between the algorithms of this 2 visualization methods:

For Guided BP we use: return (F.relu(grad_in[0]),)
For Deconvnet we use: return (F.relu(grad_out[0]),)

For me the Guided BP is understandable, but I am confused about the deconvnet. The deconvnet consist of unpooling, ReLu and deconvolution layers(https://www.quora.com/How-does-a-deconvolutional-neural-network-work), but I only find the ReLu operation is implemented by using F.relu. Maybe I wrongly understood the Deconvnet visulization method or miss something about the use of Pytorch. I hope you can give me the right direction!

Thanks a lot.

Kazuto Nakashima · Answer 1 · Fri Nov 27 2020 13:54:43 GMT+0800 (China Standard Time)

The deconvnet consist of unpooling, ReLu and deconvolution layers.

The unpooling and deconvolution are the backward routing of pooling and convolution, respectively. The relu is a negative clipping of gradient flows. Therefore you can say the guided bp consists of unpooling, relu, deconvolution, and backward activation. The difference is just the backward activation, which routes gradients based on the forward pass, not the gradients themselves. The relevant papers only consider relu activation, i.e. backward relu.

	deconvolution	backward relu	gradient relu	unpooling
vanilla bp	✓	✓		✓
deconvnet	✓		✓	✓
guided bp	✓	✓	✓	✓

For Guided BP we use: return (F.relu(grad_in[0]),)
For Deconvnet we use: return (F.relu(grad_out[0]),)

grad_in is a gradient after the backward relu, while grad_out is a gradient before the backward relu. F.relu() is the gradient relu. In vanilla backpropagation, we return the raw grad_in to the next layer.

Jan · Answer 2 · Fri Nov 27 2020 17:20:38 GMT+0800 (China Standard Time)

Thanks for the helpful and fast reply!
So in conclusion, the sole difference between the 3 approaches, is how they backpropagate through the ReLu(As show in the above table). As for the rest parts, i. e. unpooling layers and deconvolution layers, they are used in the level of algorithm implementation with no difference. We just focus on the ReLu part in the backward route when we want to get the different kinds of saliency map. Am I right?

Kazuto Nakashima · Answer 3 · Sat Nov 28 2020 12:26:35 GMT+0800 (China Standard Time)

Yes. Figure 1 of the guided bp paper is helpful to understand this point (https://arxiv.org/pdf/1412.6806.pdf).

Jan · Answer 4 · Sat Nov 28 2020 17:32:41 GMT+0800 (China Standard Time)

Thanks again for you kind help! Now this 3 methods are clearer to me : )