GradCAM throws error for models that give ClassifierOutput class as model output instead of tensors

Question

GradCAM throws error for models that give ClassifierOutput class as model output instead of tensors

AdityaDeodeshmukh opened this issue 8 months ago · comments

Aditya Deodeshmukh commented 8 months ago

Using the SwinForImageClassification model using the code given below:

class ModelOutputTarget:
    def __init__(self):
        pass
    def __call__(self, model_output):
        return torch.sigmoid(model_output)
image_processor = AutoImageProcessor.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k")
target_layers = [model.swin.encoder.layers[-1].blocks[1].layernorm_before]
input_tensor = image_processor(img,return_tensors="pt")
#print(input_tensor['pixel_values'])
cam = GradCAM(model=model, target_layers=target_layers)
targets = [ModelOutputTarget()]
grayscale_cam = cam(input_tensor['pixel_values'], targets=targets)
grayscale_cam = grayscale_cam[0, :]
visualization = show_cam_on_image(rgb_img, grayscale_cam, use_rgb=True)
model_outputs = cam.outputs

Throws the following error:

TypeError                                 Traceback (most recent call last)
Cell In[13], line 16
     13 targets = [ModelOutputTarget()]
     15 # You can also pass aug_smooth=True and eigen_smooth=True, to apply smoothing.
---> 16 grayscale_cam = cam(input_tensor['pixel_values'], targets=targets)
     18 # In this example grayscale_cam has only one image in the batch:
     19 grayscale_cam = grayscale_cam[0, :]

File /opt/conda/lib/python3.10/site-packages/pytorch_grad_cam/base_cam.py:192, in BaseCAM.__call__(self, input_tensor, targets, aug_smooth, eigen_smooth)
    188 if aug_smooth is True:
    189     return self.forward_augmentation_smoothing(
    190         input_tensor, targets, eigen_smooth)
--> 192 return self.forward(input_tensor,
    193                     targets, eigen_smooth)

File /opt/conda/lib/python3.10/site-packages/pytorch_grad_cam/base_cam.py:92, in BaseCAM.forward(self, input_tensor, targets, eigen_smooth)
     90 if self.uses_gradients:
     91     self.model.zero_grad()
---> 92     loss = sum([target(output)
     93                for target, output in zip(targets, outputs)])
     94     loss.backward(retain_graph=True)
     96 # In most of the saliency attribution papers, the saliency is
     97 # computed with a single target layer.
     98 # Commonly it is the last convolutional layer.
   (...)
    103 # use all conv layers for example, all Batchnorm layers,
    104 # or something else.

File /opt/conda/lib/python3.10/site-packages/pytorch_grad_cam/base_cam.py:92, in <listcomp>(.0)
     90 if self.uses_gradients:
     91     self.model.zero_grad()
---> 92     loss = sum([target(output)
     93                for target, output in zip(targets, outputs)])
     94     loss.backward(retain_graph=True)
     96 # In most of the saliency attribution papers, the saliency is
     97 # computed with a single target layer.
     98 # Commonly it is the last convolutional layer.
   (...)
    103 # use all conv layers for example, all Batchnorm layers,
    104 # or something else.

Cell In[12], line 6, in ModelOutputTarget.__call__(self, model_output)
      5 def __call__(self, model_output):
----> 6     return torch.sigmoid(model_output)

TypeError: sigmoid(): argument 'input' (position 1) must be Tensor, not str

This is being caused due to the following code in the base_cam.py:

self.outputs = outputs = self.activations_and_grads(input_tensor)

if targets is None:
    target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1)
    targets = [ClassifierOutputTarget(
        category) for category in target_categories]

if self.uses_gradients:
    self.model.zero_grad()
    loss = sum([target(output)
               for target, output in zip(targets, outputs)])
    loss.backward(retain_graph=True)

Since the output of the SwinForImageClassification model is a SwinClassifierOutput object, when using list comprehension to get the loss, all it captures is the key Logits and hence cannot get the proper sigmoid. Is there any workaround for this issue. Though not tested on other models, this issue will probable occur with any model that will pass a custom object at the end of the forward function.