hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ICCV 2021- Oral] PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers =================================================================================================================================================================

youtube

Notebooks for LXMERT + DETR:

DETR_LXMERT

Notebook for CLIP:

CLIP

Demo: You can check out a demo on Huggingface spaces or scan the following QR code.

image

Notebook for ViT:

ViT

Using Colab

  • Please notice that the notebook assumes that you are using a GPU. To switch runtime go to Runtime -> change runtime type and select GPU.
  • Installing all the requirements may take some time. After installation, please restart the runtime.

Running Examples

Notice that we have two jupyter notebooks to run the examples presented in the paper.

  • The notebook for LXMERT contains both the examples from the paper and examples with images from the internet and free form questions. To use your own input, simply change the URL variable to your image and the question variable to your free form question.

    image

    image

  • The notebook for DETR contains the examples from the paper. To use your own input, simply change the URL variable to your image.

    image

Reproduction of results

VisualBERT

Run the run.py script as follows:

Note

If the datasets aren't already in env.data_dir, then the script will download the data automatically to the path in env.data_dir.

LXMERT

  1. Download valid.json:

  2. Download the COCO_val2014 set to your local machine.

    Note

    If you already downloaded COCO_val2014 for the VisualBERT tests, you can simply use the same path you used for VisualBERT.

  3. Run the perturbation.py script as follows:

DETR

  1. Download the COCO dataset as described in the DETR repository. Notice you only need the validation set.
  2. Lower the IoU minimum threshold from 0.5 to 0.2 using the following steps:
  3. Run the segmentation experiment, use the following command:

Citing

If you make use of our work, please cite our paper:

Credits

  • VisualBERT implementation is based on the MMF framework.
  • LXMERT implementation is based on the offical LXMERT implementation and on Hugging Face Transformers.
  • DETR implementation is based on the offical DETR implementation.
  • CLIP implementation is based on the offical CLIP implementation.
  • The CLIP huggingface spaces demo was made by Paul Hilders, Danilo de Goede, and Piyush Bagad from the University of Amsterdam as part of their final project.

About

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

License:MIT License


Languages

Language:Jupyter Notebook 90.2%Language:Python 9.7%Language:JavaScript 0.0%Language:Shell 0.0%Language:C 0.0%Language:CSS 0.0%Language:Dockerfile 0.0%