aligning image text pairs

Question

aligning image text pairs

nikky4D opened this issue 2 years ago · comments

Nkiruka Uzuegbunam commented 2 years ago

I have a question on the paper: you train on aligned image-text pairs. How do you create this alignment? is it the same way as in MDeTr? I did not fully understand from the paper, especially for non-natural images like satellite images or medical images.

Hanoona Rasheed · Answer 1 · Thu Feb 03 2022 21:34:26 GMT+0800 (China Standard Time)

Hi @nikky4D,

Thank you for your interest in our work. Yes, the image-text aligned training is performed in the same way as MDETR. However, please note that the models have not been trained on the out-of-domain datasets (such as DOTA, KITTI, Clipart, Comic, and Watercolor) on which the class-agnostic object detection is evaluated (Refer. Table 2 in the paper).

Nkiruka Uzuegbunam · Answer 2 · Thu Feb 03 2022 21:58:18 GMT+0800 (China Standard Time)

Thank you for the response. So to understand, you've trained on the datasets used in mdetr (flickr, vqa, coco), and evaluated on DOTA, KITTI with text queries like "all objects", "all small objects"?

Hanoona Rasheed · Answer 3 · Thu Feb 03 2022 22:09:27 GMT+0800 (China Standard Time)

Yes, you are right. The detected boxes from the different queries are then combined for the evaluation.

Nkiruka Uzuegbunam · Answer 4 · Thu Feb 03 2022 22:19:15 GMT+0800 (China Standard Time)

Thank you. And am I right that the queries used for the out-of-domain datasets are only those listed in Appendix A.2?

Hanoona Rasheed · Answer 5 · Thu Feb 03 2022 22:25:45 GMT+0800 (China Standard Time)

Yes, your understanding is correct.

Nkiruka Uzuegbunam · Answer 6 · Thu Feb 03 2022 22:26:55 GMT+0800 (China Standard Time)

Thank you again for the response, the code and the paper.