Questions about your pretrained model

Question

Questions about your pretrained model

slcheng97 opened this issue 2 years ago · comments

Does the pre-trained model you provide cover the categories on LVIS data? If I want to do open-world object detection on the LVIS dataset, can I directly use your pre-trained model to generate the proposals or should I need to filter the dataset so that it doesn't contain any object in the LVIS dataset?

Muhammad Maaz · Answer 1 · Mon Dec 05 2022 20:13:10 GMT+0800 (China Standard Time)

Hi @chengsilin,

Thank you for your interest in our work. Our MAVL model is trained on 1.3M aligned image-text pairs from from Flickr30k, MS-COCO (2014), and Visual Genome (VG). We refer this dataset as LMDet Dataset (See. 2 of paper). Note that we do not explicitly include LVIS categories in LMDet, however, it has many LVIS categories mentioned in the text used for training MAVL.

So for a fair Open World comparison on LVIS, it is recommended to train MAVL on a filtered dataset removing all the captions/text that mention any of the LVIS categories. We followed a similar setting for reporting ORE results on COCO using MAVL proposals (See. 4.2 of paper).

However, during our COCO Open-world OD experiments, we note a very little difference in results when using proposals from original MAVL and the MAVL trained on a filtered dataset.

I hope this would be helpful. Do let me know if you have any questions and face any difficulty on training MAVL. Thanks