mmaaz60 / mvits_for_class_agnostic_od

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

class-specific object detection

fushh opened this issue · comments

In this paper, the authors provide an insight idea that the high-level information provided by the language descriptions helps learn fairly generalizable properties of universal object categories. However, I find that the training dataset is class-specific. Thus, I'm curious about whether MViT can perform class-specific object detection. Following OV-DETR, also similar to GLIP and the way to extract high-quality class-specific proposals using image-level labels in object-centric-ovd, we can use prompts like 'every {category}' and forward the prompts multiple times to get top-score predictions for each class. Thus, we can perform open-vocabulary object detection with MViT. However, my experiments show extremely poor results: 5.4 AP50 novel and 3.8 AP50 base. I'm confused about the results. Could you give me some advice?

Hi @fushh,

Please refer to this. Thank you.