dulucas / transformer-uvos

Unsupervised Video Object Segmentation using Transformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transformer-UVOS

This is a repo offers a strong baseline for Unsupervised Video Object Segmentation(UVOS) using Transformer

Accuracy

With Res50 as backbone and image size (640x352) during training, using only single-scale for testing On DAVIS 2016(val):

Method J_mean J_recall J_decay F_mean F_recall F_decay
Ours(Res50-FPN, 640x352) 0.777 0.915 0.066 0.766 0.859 0.043
Anchor-Diffusion(Res101-Deeplabv3, 854x480) 0.782 --- --- 0.771 --- ---

Architecture

ResNet50-FPN + Transformer + Simple Decoder

Training

You need to first download the DAVIS dataset

The network is initialized using ResNet50-FPN pre-trained on COCO dataset, which can be downloaded from here.

cd model/transformer/davis.transformer.fpn.R50.random_sample/
sh run.sh

Acknowledgement

Idea inspired by Anchor-Diffusion.

Codes based on TorchSeg, Detectron2 and Detr

Contact

Feel free to contact me if you have any questions : yuming.du@enpc.fr

About

Unsupervised Video Object Segmentation using Transformer


Languages

Language:Python 91.7%Language:Cuda 4.4%Language:C++ 3.9%Language:Shell 0.0%