gcam return zero after F.relu(gcam)

Question

gcam return zero after F.relu(gcam)

baogiadoan opened this issue 5 years ago · comments

Is it possible for a feature map that doesn't have any positive related information to the targeted class? As for now, my situation is that the result at that feature map returns all negative values for gcam, and thus lead to Zero output for GradCAM after F.relu(gcam)

Kazuto Nakashima · Answer 1 · Tue May 21 2019 14:27:17 GMT+0800 (China Standard Time)

First of all, for models with ReLU, negative values of the intermediate feature map are meaningless/ambiguous information for the final scores. Grad-CAM generally chooses the feature maps "rectified" after ReLU. Then the negative values of the final weighted map purely depend on the top-down gradients. The regions which are "negative" to the task are removed by the final ReLU.

If you want to visualize the negativeness for instead, why don't you use the modified weights described in the paper's Section 7 "Counterfactual Explanations"?

# Just flip the gradient's signs
weights = F.adaptive_avg_pool2d(-grads, 1)

baogiadoan · Answer 2 · Tue May 21 2019 14:58:49 GMT+0800 (China Standard Time)

Sorry I just want to clarify, I choose the target_layer = "features.23" which is the *(23): ReLU()* funcion hightlighted below in my network, is it correct layer to best visualize for GradCAM, as I read from the paper, the author recommend using the deepest rectified conv layer. If it is a correct layer, why gcam return all negative values for my network while it should have positive features there? Thanks. Regards Sequential( (0): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (2): ReLU() (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (5): ReLU() (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (9): ReLU() (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (12): ReLU() (13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (14): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (16): ReLU() (17): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (19): ReLU() (20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (21): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1)) (22): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) *(23): ReLU()* (24): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) Sequential( (0): Linear(in_features=1024, out_features=43, bias=True) )

…

On Tue, 21 May 2019 at 15:57, Kazuto Nakashima ***@***.***> wrote: First of all, for models with ReLU, negative values of the intermediate feature map are meaningless/ambiguous information for the final scores. Grad-CAM generally chooses the feature maps "rectified" after ReLU. Then the negative values of the final weighted map purely depend on the top-down gradients. The regions which are "negative" to the task are removed by the final ReLU. If you want to visualize the negativeness for instead, why don't you use the modified weights described in the paper's Section 7 "Counterfactual Explanations"? # Just flip the gradient's signs weights = F.adaptive_avg_pool2d(-grads, 1) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15?email_source=notifications&email_token=ABFZFNBTDLJRPSYJ6VQY4VTPWOI4LA5CNFSM4HOHH532YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV24HYI#issuecomment-494257121>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABFZFNBWR3VV4RN424C63STPWOI4LANCNFSM4HOHH53Q> .

-- Best regards, *Bao*

Kazuto Nakashima · Answer 3 · Tue May 21 2019 15:38:51 GMT+0800 (China Standard Time)

The layer looks correct. However, gradients with respect to features.23 layer could have negative values, which are used to weigh the positive feature maps. Moreover, when the global average pooling, the positive gradients occurred locally may disappear in spatially occupied negative gradients. Please check the gradients, not the feature maps.

baogiadoan · Answer 4 · Fri May 24 2019 16:11:23 GMT+0800 (China Standard Time)

I found out that weights = self._compute_grad_weights(grads) results contained both positive and negative values, fmaps contained all positive values, so obviously multiply together will give both positive and negative values, but when adding together the result returns all negative causing the final gcam result shows nothing after F.relu(gcam)
gcam = torch.mul(fmaps, weights).sum(dim=1, keepdim=True)

If I add one more line weights = F.relu(weights) before the code gcam = torch.mul(fmaps, weights).sum(dim=1, keepdim=True) meaning that I remove out all negative values, it now can show the heatmap for the object I want, not sure is it the right thing to do...

Kazuto Nakashima · Answer 5 · Fri May 24 2019 17:57:12 GMT+0800 (China Standard Time)

Grad-CAM heatmap represents average contribution over the channels. Removing the negative weights can push up the positive regions; however, they might be overestimated even if the regions actually weaken the target score. I would say the current image less contribute to the specified class or the score is derived from other regions actually.

Andrei Margeloiu · Answer 6 · Fri Apr 24 2020 21:42:13 GMT+0800 (China Standard Time)

Can someone explain what does it means for GradCAM to be all negative?

It's weird to think that all channels in the last convolution lead to "negative" contribution to the true output class.