mmaaz60 / mvits_for_class_agnostic_od

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about your pretrained model

slcheng97 opened this issue · comments

Does the pre-trained model you provide cover the categories on LVIS data? If I want to do open-world object detection on the LVIS dataset, can I directly use your pre-trained model to generate the proposals or should I need to filter the dataset so that it doesn't contain any object in the LVIS dataset?

Hi @chengsilin,

Thank you for your interest in our work. Our MAVL model is trained on 1.3M aligned image-text pairs from from Flickr30k, MS-COCO (2014), and Visual Genome (VG). We refer this dataset as LMDet Dataset (See. 2 of paper). Note that we do not explicitly include LVIS categories in LMDet, however, it has many LVIS categories mentioned in the text used for training MAVL.

So for a fair Open World comparison on LVIS, it is recommended to train MAVL on a filtered dataset removing all the captions/text that mention any of the LVIS categories. We followed a similar setting for reporting ORE results on COCO using MAVL proposals (See. 4.2 of paper).

However, during our COCO Open-world OD experiments, we note a very little difference in results when using proposals from original MAVL and the MAVL trained on a filtered dataset.

I hope this would be helpful. Do let me know if you have any questions and face any difficulty on training MAVL. Thanks