chenjinyubuaa / SEvol

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SEvol

Code for our CVPR 2021 paper "Reinforced Structured State-Evolution for Vision-Language Navigation".

Contributed by Jinyu Chen, Chen Gao, Erli Meng, Qiong Zhang, Si Liu

Getting Started

Installation

  1. Install the Matterport3D simulators, please follow the intructions here.

  2. Clone this repository.

    cd Matterport3DSimulator && mkdir methods && cd methods
    git clone https://github.com/chenjinyubuaa/SEvol.git
    
  3. Install the requirements.

    pip install -r requirements.txt
    

Training and Test

Dataset Preparation

Please download the data and pretrained checkpoints from here. put the img_features and task directory under the Matterport3DSimulator directory. The CLIP image feature downloads from here.

Training

Following Speaker-follower and EnvDrop, we train our model on R2R as follows:

  1. Train the speaker model under the Matterport3DSimulator:
    bash methods/SEvol/run/train_speaker.sh 0
  1. Train the follower model:
    bash methods/SEvol/run/train_r2r.sh 0
  1. train with the back translation data augmentation:
    bash methods/SEvol/run/train_r2r_bt.sh 0

We use the speaker model with best bleu and the follower model with the best SR on val-unseen split for the 3rd stage training

Test

  1. Use the valid.sh to test the checkpoints. Just change the checkpoint path in it

    bash methods/SEvol/run/valid.sh 0
    

Citation

Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package.

@InProceedings{Chen_2022_CVPR,
    author    = {Chen, Jinyu and Gao, Chen and Meng, Erli and Zhang, Qiong and Liu, Si},
    title     = {Reinforced Structured State-Evolution for Vision-Language Navigation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {15450-15459}
}

License

CKR-nav is released under the MIT license. See LICENSE for additional details.

Acknowledge

Some of the codes are built upon NvEM and EnvDrop. Thanks them for their great works!

About


Languages

Language:Python 98.8%Language:Shell 1.2%