linjieli222 / seada-vqa

A pytorch implemetation of data augmentation method for visual question answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering-Pytorch

Working in Progress

This repository corresponds to the ECCV 2020 paper Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering.

Dependencies

You may need at least 1 GPU with 11GB memory for training, and 200GB free disk space for storing VQAv2 dataset. We strongly recommend to use a SSD drive to guarantee high-speed I/O.

  • Python 3.6
    • pytorch = 1.0
    • torchvision 0.2
    • h5py 2.7
    • tqdm 4.19

Installation

  1. git clone https://github.com/zaynmi/seada-vqa.git
    
  2. We recommend to install everything in an Anaconda environment.

    conda create -n seada python=3.6
    source activate seada
    
  3. Install Pytorch 1.0 and torchvision

    conda install pytorch=1.0 torchvision cudatoolkit=10.0 -c pytorch
    
  4. Install other dependencies as follows:

    pip install -r requirements.txt
    python -m spacy download en
    
  5. Install OpenNMT-py for generating paraphrases, it allows to install onmt package in your environment:

    git clone https://github.com/zaynmi/OpenNMT-py.git
    cd OpenNMT-py
    python setup.py install
    cd ..
    
  6. Download and unpack the translation models into the seada/sea/translation_models folder. You'll get four .pt models.

Prepare Dataset (Follow Cyanogenoid/vqa-counting)

  • In the data directory, execute ./download.sh to download VQA v2 and the bottom-up-top-down features.
  • Prepare the data by running
python data/preprocess-features.py
python data/preprocess-vocab.py

This creates an h5py database (95 GiB) containing the object proposal features and a vocabulary for questions and answers at the locations specified in config.py. It is strongly recommended to put database in SSD.

Training

Step 1: Generating the paraphrases of questions

python main.py --attack_only --attack_mode q --attack_al sea --attacked_checkpoint {your_trained_model}.pth --fliprate 0.3 --topk 2 --paraphrase_data train

This would generate paraphrases of train set with top-2 semantic similarity score and 30% flip rate considering {your_trained_model}.pth (A BUTD model), and store them in config.paraphrase_save_path. Similarly, by setting --paraphrase_data val, you can get paraphrases of val set. Don't forget to change config.paraphrase_save_path accordingly.

In our paper, we didn't specify the flip rate , topk and attacked_checkpoint (--fliprate 0, --topk 1), which means we simply use paraphrases with top-1 semantic similarity score.

There is another step left. We need to sort the generated paraphrases in the same order with annotations file. The script is in sort_para.py

Step 2: Adversarial training

  • Option-1. Use both visual adversarial examples and paraphrases to augment data.

    python main.py --advtrain --attack_al ifgsm,sea --attack_mode vq --attacked_checkpoint {checkpoint_you_attack_when_eval}.pth  --resume {your_partial_trained_model}.pth
    
  • Option-2. Use visual adversarial examples to augment data.

    python main.py --advtrain --attack_al ifgsm --attack_mode v --attacked_checkpoint {checkpoint_you_attack_when_eval}.pth  --resume {your_partial_trained_model}.pth
    
  • Option-3. Use paraphrases to augment data.

    python main.py --advtrain --attack_al sea --attack_mode q --attacked_checkpoint {checkpoint_you_attack_when_eval}.pth  --resume {your_partial_trained_model}.pth
    

--attacked_checkpoint is optional, which allows you to evaluate the performance of adversarially trained model defense against adversarial examples generated by {checkpoint_you_attack_when_eval}.pth

If you want to train with train and val set, add --advtrain_data trainval

Evaluation

  • Generate .json file for you to upload to on-line evaluation server. The result file is specified in config.result_json_path.
python main.py --test_advtrain --checkpoint {your_trained_model}.pth
  • Or you can evaluate on the val set. --attacked_checkpoint is optional and if it is declared, you would see the performance of defense.
python main.py --eval_advtrain --checkpoint {your_trained_model}.pth --attack_al ifgsm --attack_mode v --attacked_checkpoint {checkpoint_you_attack_when_eval}.pth 

Performance of the model when being attacked

How our model behaves when attacked by the attackers is of great concern to us too. You can use

python main.py --attack_only --attack_mode v --attack_al pgd --alpha 0.5 --iteration 6 --epsilon 5 --attacked_checkpoint {checkpoint_being_attacked}.pth 

All the attackers act as a white-box attacker.

License

The code is released under the MIT License

Citing

If this repository is helpful for your research, we'd really appreciate it if you could cite the following paper:

@inproceedings{tang2020semantic,
  title={Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering},
  author={Tang, Ruixue and Ma,Chao and Zhang, Wei Emma and Wu, Qi and Yang, Xiaokang},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
}

About

A pytorch implemetation of data augmentation method for visual question answering

License:MIT License


Languages

Language:Python 99.1%Language:Shell 0.9%