computer-vision policy-learning pretraining

Learning to Drive by Watching YouTube videos: Action-Conditioned Contrastive Policy Pretraining (ECCV22)

Webpage | Code | Paper | YouTube Driving Dataset | Pretrained ResNet34

Installation

Our codebase is based on MoCo. A simple PyTorch environment will be enough (see MoCo's installation instruction).

Dataset

We collect driving videos from YouTube. Here we provide the 🔗 video list we used. You could also download the frames directly via 🔗 this OneDrive link and run:

cat sega* > frames.zip

to get the zip file. For training ACO, you should also download label.pt and meta.txt, and put them under {aco_path}/code and {your_dataset_directory}/ respectively.

Training

We provide main_label_moco.py for training. To perform ACO training of a ResNet-34 model in an 8-gpu machine, run:

python main_label_moco.py -a resnet34 --mlp -j 16 --lr 0.003 \
			 --batch-size 256 --moco-k 40960 --dist-url 'tcp://localhost:10001' \
			 --multiprocessing-distributed --world-size 1 --rank 0 {your_dataset_directory}

Some important arguments:

--aug_cf: whether to use Cropping and Flipping augmentations in pre-training. In ACO, we do not use these two augmentations by default.
--thres: action similarity threshold.

Pretrained weights

We also provide 🔗 pretrained ResNet34 checkpoint. After downloading, you can load this checkpoint via:

import torch
from torchvision.models import resnet34
net = resnet34()
net.load_state_dict(torch.load('ACO_resnet34.ckpt'), strict=False)

Bibtex

@article{zhang2022learning,
  title={Learning to Drive by Watching YouTube videos: Action-Conditioned Contrastive Policy Pretraining},
  author={Zhang, Qihang and Peng, Zhenghao and Zhou, Bolei},
  journal={European Conference on Computer Vision (ECCV)},
  year={2022}
}

About

[ECCV 2022] Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining

computer-vision policy-learning pretraining

Languages

Language:Python 100.0%