reshalfahsi / instance-segmentation-vit-maskrcnn

Instance Segmentation Using ViT-based Mask R-CNN

instance-segmentation mask-rcnn penn-fudan-database penn-fudan-dataset vision-transformer pytorch

Instance Segmentation Using ViT-based Mask R-CNN

colab

qualitative-3

Instance segmentation aims at dichotomizing a pixel acting as a sub-object of a unique entity in the scene. One of the approaches, which combines object detection and semantic segmentation, is Mask R-CNN. Furthermore, we can also incorporate ViT as the backbone of Mask R-CNN. In this project, the pre-trained ViT-based Mask R-CNN model is fine-tuned and evaluated on the dataset from the Penn-Fudan Database for Pedestrian Detection and Segmentation. With a ratio of 80:10:10, the train, validation, and test sets are distributed.

Experiment

Leap into this link that harbors a Jupyter Notebook of the entire experiment.

Result

Quantitative Result

The following table delivers the performance results of ViT-based Mask R-CNN, quantitatively.

Test Metric	Score
mAP^box@0.5:0.95	96.85%
mAP^mask@0.5:0.95	79.58%

Loss Curve

Loss curves of ViT-based Mask R-CNN on the Penn-Fudan Database for Pedestrian Detection and Segmentation train and validation sets.

Qualitative Result

Below, the qualitative results are presented.

Few samples of qualitative results from the ViT-based Mask R-CNN model.

Credit

About

Instance Segmentation Using ViT-based Mask R-CNN

instance-segmentation mask-rcnn penn-fudan-database penn-fudan-dataset vision-transformer pytorch

Languages

Language:Jupyter Notebook 100.0%