kazuto1011 / grad-cam-pytorch

PyTorch re-implementation of Grad-CAM (+ vanilla/guided backpropagation, deconvnet, and occlusion sensitivity maps)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to do GuidedBackPropagation

csyuhao opened this issue · comments

Hi @kazuto1011 , Thanks a lot for your great work! I have a small question when I try to use GuidedBackPropagation in my own model. In my model, I use 'PReLU' layer instead of 'ReLU' module. So how do I use GuidedBackPropagation? Should I do the same thing?

To my understanding, the core idea of guided backpropagation is to ignore the backflow of the signal x that decreases the score E. In the ReLU case, zero gradients often occur depending on the bottom-up flow; thus guided backpropagation can safely set the negative gradients ∂E/∂x<0 to zero in top-down flow. My opinion is that the same manner may work for PReLU while I'm not sure if the masked sparse gradients can be visualized clearly in the pixel space. Anyway, that can be done with the following backward hook, give it a try!

def backward_hook(module, grad_in, grad_out):
    if isinstance(module, nn.PReLU):
        return (F.relu(grad_in[0]),)
commented

Thanks a lot for your quick response!

I try this backward hook and find an error, which is caused by the weight a of PReLU. The grad_in has two parameters. One is gradient of input data (specifically, the feature maps), the other is the gradient of weight a. However, the gradient of a is unrelated with the input data. We can just ignore it. Here are my backward hook.

def backward_hook(module, grad_in, grad_out):
    # cut off negative gradients
    if isinstance(module, nn.ReLU):
        return (F.relu(grad_in[0]), )
    elif isinstance(module, nn.PReLU):
        return (F.relu(grad_in[0]), grad_in[1], )

And I have another question about Grad-CAM with PReLU. In my own model, I replace ReLU with PReLU. In original paper of Grad-CAM, the weighted combination of feature maps ∑ α k Akis input into ReLU to cut off negative values. However, in my model, the negative values also have contribute to final logits.

So I think I should replace ReLU with PReLU when generating Grad-CAM.

fmaps = self._find(self.fmap_pool, target_layer)
grads = self._find(self.grad_pool, target_layer)
weights = F.adaptive_avg_pool2d(grads, 1)

gcam = torch.mul(fmaps, weights).sum(dim=1, keepdim=True)
if prelu_weight is not None:
    gcam = F.prelu(gcam, prelu_weight.mean())
else:
    gcam = F.relu(gcam)
gcam = F.interpolate(gcam, self.image_shape, mode='bilinear', align_corners=False)

The architecture of model is "target_layer (final Conv2d) ==> PReLU => Linear". So the weight PReLU is from the following the PReLU layer.

Do you think this is right to do Grad-CAM? If you give me advice, I would appreciate it.

Thank you for making the revision!
I think the original ∑ αk Ak assumes that the activation Ak>=0; the nagative responses derive from αk, so that the authors compute ReLU(∑ αk Ak) to cut off the negative contribution. The issue is that Ak has meaningful negatives in your case, not that the final op is ReLU. So I propose to cut the negative gradients off first, although the result would be a bit different from Grad-CAM.

grads = F.relu(grads)
weights = F.adaptive_avg_pool2d(grads, 1)
gcam = torch.mul(fmaps, weights).sum(dim=1, keepdim=True)

I assume the fmaps is an activation map from the last PReLU like Grad-CAM.
Grad-CAM++ uses ReLU onto the weights as the generalized case.

commented

Thanks for your advice. I will read Grad-CAM++ paper.