class-specific object detection
fushh opened this issue · comments
In this paper, the authors provide an insight idea that the high-level information provided by the language descriptions helps learn fairly generalizable properties of universal object categories. However, I find that the training dataset is class-specific. Thus, I'm curious about whether MViT can perform class-specific object detection. Following OV-DETR, also similar to GLIP and the way to extract high-quality class-specific proposals using image-level labels in object-centric-ovd, we can use prompts like 'every {category}' and forward the prompts multiple times to get top-score predictions for each class. Thus, we can perform open-vocabulary object detection with MViT. However, my experiments show extremely poor results: 5.4 AP50 novel and 3.8 AP50 base. I'm confused about the results. Could you give me some advice?