Traffic-X / ViT-CoMer

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ViT-CoMer with O365 Pretrained Result?

DeclK opened this issue · comments

Hi, amazing work here! I am wondering have you tried to pre-train ViT-CoMer with O365 and then train on COCO2017?
It is amazing that it can achieve 64+ mAP without the O365 pre-train, but I am still curious about how far can this method go. Because in Co-DETR, using O365 pretrain, the Swin-L backbone would go from 60.4 to 64.1, it is huge boost.

Look forward to your reply😃

Thank you for your interest in our work. Due to resource limitations, we have not yet attempted to pre-train ViT-CoMer using Obj365 and fine-tune it on the COCO2017.