airbert-vln / airbert

Codebase for the Airbert paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🏘️ Airbert: In-domain Pretraining for Vision-and-Language Navigation 🏘️

MIT arXiv R2R 1st ICCV 2021 site

This repository stores the codebase for Airbert and some pre-trained model. It is based on the codebase of VLN-BERT.

πŸ› οΈ 1. Getting started

You need to have a recent version of Python (higher than 3.6) and install dependencies:

pip install -r requirements.txt

πŸ’½ 2. Preparing dataset

You need first to download the BnB dataset, prepare an LMDB file containing visual features and the BnB dataset files. Everything is described in our BnB dataset repository.

πŸ’ͺ 3. Training Airbert

Download a checkpoint of VilBERT pre-trained on Conceptual Captions.

Fine-tune the checkpoint on the BnB dataset using one of the following path-instruction method.

To make the training faster, a SLURM script is provided with 64 GPUs. You can provide extra arguments depending on the path-instruction method.

For example:

export name=pretraining-with-captionless-insertion
echo $name
sbatch --job-name $name \
 --export=name=$name,pretrained=vilbert.bin,args=" --masked_vision --masked_language --min_captioned 2 --separators",prefix=2capt+ \
 train-bnb-8.slurm

⛓️ 3.1. Concatenation

Make sure you have the following dataset file:

  • data/bnb/bnb_train.json
  • data/bnb/bnb_test.json
  • data/bnb/testset.json

Then, launch training:

python train_bnb.py \
  --from_pretrained vilbert.bin \
  --save_name concatenation \
  --separators \
  --min_captioned 7 \
  --masked_vision \
  --masked_language

πŸ‘₯ 3.2. Image merging

Make sure you have the following dataset file:

  • data/bnb/merge+bnb_train.json
  • data/bnb/merge+bnb_test.json
  • data/bnb/merge+testset.json

Then, launch training:

python train_bnb.py \
  --from_pretrained vilbert.bin \
  --save_name image_merging \
  --prefix merge+ \
  --min_captioned 7 \
  --separators \
  --masked_vision \
  --masked_language

πŸ‘¨β€πŸ‘©β€πŸ‘§ 3.3. Captionless insertion

Make sure you have the following dataset file:

  • data/bnb/2capt+bnb_train.json
  • data/bnb/2capt+bnb_test.json
  • data/bnb/2capt+testset.json

Then, launch training:

python train_bnb.py \
  --from_pretrained vilbert.bin \
  --save_name captionless_insertion \
  --prefix 2capt+ \
  --min_captioned 2 \
  --separators \
  --masked_vision \
  --masked_language

πŸ‘£ 3.4. Instruction rephrasing

Make sure you have the following dataset file:

  • data/bnb/np+bnb_train.json
  • data/bnb/np+bnb_test.json
  • data/bnb/np+testset.json
  • data/np_train.json

Then, launch training:

python train_bnb.py \
  --from_pretrained vilbert.bin \
  --save_name instruction_rephrasing \
  --prefix np+ \
  --min_captioned 7 \
  --separators \
  --masked_vision \
  --masked_language \
  --skeleton data/np_train.json

πŸ•΅οΈ 4. Fine-tuning on R2R in Discriminative Setting

First of all, you need to download the R2R data:

make r2r

4.1. Fine-tune with masking losses

python train.py \
  --from_pretrained bnb-pretrained.bin \
  --save_name r2rM \
  --masked_language --masked_vision --no_ranking

4.2. Fine-tune with the ranking and the shuffling loss

python train.py \
  --from_pretrained r2rM.bin \
  --save_name r2rRS \
  --shuffle_visual_features

4.3. Fine-tune with the ranking and the shuffling loss and the speaker data augmented

Download the augmented paths from EnvDrop:

make speaker

Then use the train.py script:

python train.py \
  --from_pretrained r2rM.bin \
  --save_name r2rRS \
  --shuffle_visual_features \
  --prefix aug+ \
  --beam_prefix aug_

You can download a pretrained model from our model zoo.

πŸ§ͺ 5. Testing Airbert on R2R with a Discriminative Setting

pushd ../model-zoos # https://github.com/airbert-vln/model-zoos
make airbert-r2rRSA
popd

# Install dependencies if not already done
poetry install

# Download data if not already done
make r2r
make lmdb

poetry run python test.py \
  --from_pretrained ../model-zoos/airbert-r2rRSA.bin \
  --save_name testing \
  --split val_unseen

🀰 6. Fine-tuning on REVERIE and R2R in Generative Setting

Please see the repository dedicated for finetuning Airbert in generative setting.

πŸ€ 7. Few-shot learning

The datasets are provided in data/task/

Citing our paper

See the BibTex file.

About

Codebase for the Airbert paper

License:MIT License


Languages

Language:Python 99.0%Language:Shell 0.8%Language:Makefile 0.2%