Transformer-UVOS

This is a repo offers a strong baseline for Unsupervised Video Object Segmentation(UVOS) using Transformer

Accuracy

With Res50 as backbone and image size (640x352) during training, using only single-scale for testing On DAVIS 2016(val):

Method	J_mean	J_recall	J_decay	F_mean	F_recall	F_decay
Ours(Res50-FPN, 640x352)	0.777	0.915	0.066	0.766	0.859	0.043
Anchor-Diffusion(Res101-Deeplabv3, 854x480)	0.782	---	---	0.771	---	---

ResNet50-FPN + Transformer + Simple Decoder

You need to first download the DAVIS dataset

The network is initialized using ResNet50-FPN pre-trained on COCO dataset, which can be downloaded from here.

cd model/transformer/davis.transformer.fpn.R50.random_sample/
sh run.sh

Idea inspired by Anchor-Diffusion.

Feel free to contact me if you have any questions : yuming.du@enpc.fr

Unsupervised Video Object Segmentation using Transformer

Language:Python 91.7%Language:Cuda 4.4%Language:C++ 3.9%Language:Shell 0.0%