What is the best way to do one-shot image-conditioned in Owl-v2

Question

What is the best way to do one-shot image-conditioned in Owl-v2

dienhoa opened this issue 5 months ago · comments

In Owl-ViT v1, to identify the optimal embedding vector representing the query image, as stated in the paper:

We search for the most dissimilar class embedding within the group of class embeddings whose corresponding box has IoU > 0.65 with Q

This method aims to identify the foreground object.

However, in Owl-ViT v2, we now have the Objectness score for this purpose. I'm wondering if it would be better to use the Objectness score instead of the class embedding to identify the foreground object.

Matthias Minderer · Answer 1 · Mon Feb 12 2024 19:27:08 GMT+0800 (China Standard Time)

Yes, in OWLv2 I'd use the objectness score. This is how the example in the colab does it.