region features can't match text features while calculating similarity

Question

region features can't match text features while calculating similarity

epistimi22 opened this issue a year ago · comments

Hi, thanks for your contributive work.
I'm trying some demos based on your pretrained checkpoints. And I followed your settings in modeling.meta_arch.PretrainFastRCNN which I believe is your base model for pretraing.
According to your codes in self.get_region_features and self.region_concept_matching, we can get image features and text features, respectively. But the dimension of text features is fixed to 1024 while of image features is 2048, because of your design of ModifiedResNet.
def _shared_roi_transform(self, features, boxes, backbone_res5):
x = self.pooler(features, boxes)
return backbone_res5(x)
In this function above, features are 1024-d and x would be 1024-d also, backbone_res5 is backbone.layer4, and output will be 2048-d.
I didn't find any transformation before you calculated the similarity of concepts and regions in region_concept_matching.
So, could you please help me about this issue, thanks again.

Yiwu Zhong · Answer 1 · Wed Mar 20 2024 14:44:20 GMT+0800 (China Standard Time)

@epistimi22 Please follow the guideline of region feature extraction.

Lv You · Answer 2 · Wed Mar 20 2024 16:59:50 GMT+0800 (China Standard Time)

Appreciate your generous response, that would be great of help.