Questions about your pretrained model
slcheng97 opened this issue · comments
Does the pre-trained model you provide cover the categories on LVIS data? If I want to do open-world object detection on the LVIS dataset, can I directly use your pre-trained model to generate the proposals or should I need to filter the dataset so that it doesn't contain any object in the LVIS dataset?
Hi @chengsilin,
Thank you for your interest in our work. Our MAVL model is trained on 1.3M aligned image-text pairs from from Flickr30k, MS-COCO (2014), and Visual Genome (VG). We refer this dataset as LMDet Dataset (See. 2 of paper). Note that we do not explicitly include LVIS categories in LMDet, however, it has many LVIS categories mentioned in the text used for training MAVL.
So for a fair Open World comparison on LVIS, it is recommended to train MAVL on a filtered dataset removing all the captions/text that mention any of the LVIS categories. We followed a similar setting for reporting ORE results on COCO using MAVL proposals (See. 4.2 of paper).
However, during our COCO Open-world OD experiments, we note a very little difference in results when using proposals from original MAVL and the MAVL trained on a filtered dataset.
I hope this would be helpful. Do let me know if you have any questions and face any difficulty on training MAVL. Thanks