microsoft / RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to zero-shot inference my own label class instead of COCO or LVIS

QHCV opened this issue · comments

Very good work, I would like to know if it is possible to implement zero-shot inference own label class, if so how should I do it, can you specify it? Thank you.

Hello! I also want to try the zero-shot function on my own label class. I try to get the text_embedding and region feature respectively. And then calculate the similarity. But I got some troubles and didn't succeed so far. I am not sure this procedure could work or not. Would you like to share your method concerning this question if you have already solved it? Many thanks in advanced.

Hello! I also want to try the zero-shot function on my own label class. I try to get the text_embedding and region feature respectively. And then calculate the similarity. But I got some troubles and didn't succeed so far. I am not sure this procedure could work or not. Would you like to share your method concerning this question if you have already solved it? Many thanks in advanced.

I used the Extract Concept Features example in the readme to get the text embedding of the label class, and then used the example in Zero-shot Inference to infer my own dataset, without using Extract Region Features in the process.

I am trying to do the same thing by using the customized concept_embeds.pth and change the NUM_CLASSES to 3 in the config file, but it gives an error says " File "RegionCLIP/detectron2/modeling/roi_heads/fast_rcnn.py", line 456, in init
self.cls_score.weight.copy_(pre_computed_w)
RuntimeError: The size of tensor a (1203) must match the size of tensor b (4) at non-singleton dimension 0". I wonder if you had the same problem. Any help is appreciated! Thank you!