This is a PyTorch implementation of CVPR2021 paper "Reciprocal Transformations for Unsupervised Video Object Segmentation". Arxiv
For the Resnet34 based model, the training is conducted on four GeForce RTX 2080Ti GPUs with 11GB Memory. For the Resnext50 based model, the training is conducted on four V100-SXM2 GPUs with 32GB Memory.
- Python
- PyTorch 1.6.0
- Torchvision 0.7
In the paper, we use two datasets: DAVIS16 and DUTS. Note that the images need to be vertically and horizontally flipped and saved, therefore, the number of images is four times as large as that of original dataset.
Please following the the instruction of RAFT to prepare the optial flow. Note that both forward and backward optical flow is required. The optical flows are also calculated flipped images instead of flipping the optical flow of the original images.
Download the pretrained model of appearance (spatial-R34 or spatial RX-50) and motion stream (temporal-R34 or temporal RX-50) in Goolge Drive, Baidu Pan (code:ohyo) into ./models
.
The training code of these two streams can also be found there.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train-distribuetd.py
-
Download pretrained model (model_R34.pth or model_RX50.pth) from Google Drive, Baidu Pan (code:296x) into
./saved_model
-
Run
python test.py
You can download the pre-computed segmentation maps from Google Drive, Baidu Pan (code:3tkj)
@inproceedings{ren2020rtnet,
title={Reciprocal Transformations for Unsupervised Video Object Segmentation},
author={Sucheng, Ren and Wenxi, Liu and Yongtuo, Liu and Haoxin, Chen and Guoqiang, Han and Shengfeng, He},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2021}
For any questions, please feel free to contact Sucheng Ren.