ir413/mvp

Masked Visual Pre-training for Robotics

Overview

This repository contains the PyTorch implementation of the following two papers:

It includes the pre-trained vision models and PPO/BC training code used in the papers.

Pre-trained vision enocoders

We provide our pre-trained vision encoders. The models are in the same format as mae and timm:

backbone	params	images	objective	md5	download
ViT-S	22M	700K	MAE	`fe6e30`	model
ViT-B	86M	4.5M	MAE	`526093`	model
ViT-L	307M	4.5M	MAE	`5352b0`	model

You can use our pre-trained models directly in your code (e.g., to extract image features) or use them with our training code. We provde instructions for both use-cases next.

Using pre-trained models in your code

Install PyTorch and mvp package:

pip install git+https://github.com/ir413/mvp

Import pre-trained models:

import mvp

model = mvp.load("vitb-mae-egosoup")
model.freeze()

Benchmark suite and training code

Please see TASKS.md for task descriptions and GETTING_STARTED.md for installation and training instructions.

Citation

If you find the code or pre-trained models useful in your research, please consider citing an appropriate subset of the following papers:

@article{Xiao2022
  title = {Masked Visual Pre-training for Motor Control},
  author = {Tete Xiao and Ilija Radosavovic and Trevor Darrell and Jitendra Malik},
  journal = {arXiv:2203.06173},
  year = {2022}
}

@article{Radosavovic2022,
  title = {Real-World Robot Learning with Masked Visual Pre-training},
  author = {Ilija Radosavovic and Tete Xiao and Stephen James and Pieter Abbeel and Jitendra Malik and Trevor Darrell},
  year = {2022},
  journal = {CoRL}
}

Acknowledgments

We thank NVIDIA IsaacGym and PhysX teams for making the simulator and preview code examples available.

About

Masked Visual Pre-training for Robotics

Languages

Language:Python 99.8%Language:Batchfile 0.1%Language:CMake 0.1%