FarzanaR11 / debit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Correspondence Pretext Tasks for Goal-oriented Visual Navigation

TLDR: An end-to-end trained agent for image goal navigation

(ICLR 2024)

Demo video

Repository content

  • habitat_baselines-compatible implementation of the agent using DEBiT for image goal navigation
  • final trained weights for the agent (4 size variants: L, B, S, T) to reproduce published results
  • imagegoal navigation training (ie. 3rd phase in pipeline below) can also be reproduced by extracting visual encoders pre-trained weights

We do not plan to include here:

  • cross-completion pre-training (ie. 1st phase in pipeline below) code and datasets: refer to the official CroCo repository
  • relative pose and visibility estimation (ie. 2nd phase in pipeline below) pre-training code and dataset

Training pipeline figure

Abstract

Most recent work in goal oriented visual navigation resorts to large-scale machine learning in simulated environments. The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input. The latter is particularly difficult when the goal is not given as a category ("ObjectNav") but as an exemplar image ("ImageNav"), as the perception module needs to learn a comparison strategy requiring to solve an underlying visual correspondence problem. This has been shown to be difficult from reward alone or with standard auxiliary tasks. We address this problem through a sequence of two pretext tasks, which serve as a prior for what we argue is one of the main bottleneck in perception, extremely wide-baseline relative pose estimation and visibility prediction in complex scenes. The first pretext task, cross-view completion is a proxy for the underlying visual correspondence problem, while the second task addresses goal detection and finding directly. We propose a new dual encoder with a large-capacity binocular ViT model and show that correspondence solutions naturally emerge from the training signals. Experiments show significant improvements and SOTA performance on the two benchmarks, ImageNav and the Instance-ImageNav variant, where camera intrinsics and height differ between observation and goal.

Teaser figure

Citation

@inproceedings{
    bono:24:imgnav:debit,
    title={
        End-to-End (Instance)-Image Goal Navigation
        through Correspondence as an Emergent Phenomenon
    },
    author={
        Guillaume Bono
        and Leonid Antsfeld
        and Boris Chidlovskii
        and Philippe Weinzaepfel
        and Christian Wolf
    },
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=cphhnHjCvC}
}

Installation

  1. Install habitat-sim:
conda create -n debit python=3.8 cmake=3.14.0
conda activate debit
conda install habitat-sim=0.2.3 headless -c aihabitat -c conda-forge
  1. Install habitat-lab with baselines:
mkdir deps
git clone https://github.com/facebookresearch/habitat-lab -b v0.2.3 deps/habitat-lab
cd deps/habitat-lab
pip install -e habitat-lab
pip install -e habitat-baselines
  1. Clone CroCo repo and make it an installable package:
cd -
git clone https://github.com/naver/croco src/croco
find src/croco -type d -exec touch {}/__init__.py \;
find src/croco/models -name "*.py" -exec sed -ie 's/^from models/from /' {} \;
  1. Install DEBiT and CroCo:
pip install -e .
  1. Download pre-trained weights:
mkdir -p out/ckpt/hab_bl/imgnav
cd out/ckpt/hab_bl/imgnav
Architecture CroCo + RPEV + PPO(imgnav)
DEBiT-L curl -LO https://download.europe.naverlabs.com/navigation/debit/debit_large.pth
DEBiT-B curl -LO https://download.europe.naverlabs.com/navigation/debit/debit_base.pth
DEBiT-S curl -LO https://download.europe.naverlabs.com/navigation/debit/debit_small.pth
DEBiT-T curl -LO https://download.europe.naverlabs.com/navigation/debit/debit_tiny.pth

Evaluation

cd -
python scripts/train_eval_ppo.py \
    --run-type eval \
    --exp-config configs/imgnav-gibson-debit.yaml \
    debit=debit_base \
    habitat_baselines.eval_ckpt_path_dir=out/ckpt/hab_bl/imgnav/debit_base.pth

Training (3rd phase)

python scripts/extract_pretrained_croco_rpve.py
python scripts/train_eval_ppo.py \
    --run-type train \
    --exp-config configs/imgnav-gibson-debit.yaml \
    debit=debit_base \
    debit.pretrained_binoc_weights=weights/nle/checkpoints/croco-rpve/debit_base.pth

About

License:Other


Languages

Language:Python 100.0%