Ziyang412 / UCoFiA

Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)

Home Page:https://arxiv.org/abs/2309.10091

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

logit_scale

Arsiuuu opened this issue · comments

Thanks for the great work! I wonder when train or evaluate the model, why only multiple the logit_scale on video-sentence score and sentence-frame score but ignore the pixel_word_score? Does this mean that model cares more about the first two and ignore the latter?

video_sentence_logits = logit_scale * torch.matmul(torch.matmul(sentence_output, self.global_mat_weight), torch.matmul(video_output,self.global_mat_weight_1).t() )

Hi, thanks for the question! we do not consider configuring the logit_scale specifically for any scores (initialized by the following code), thanks!

self.logit_scale = nn.Parameter(torch.ones([]))

Sorry, my question is not expressed clearly. What I want to ask is why logit_scale is not multiplied to pixel_word_score, or do you mean that the logit_scale has not been updated?

pixel_word_score = (sent2frame_logits + video2word_logits) / 2

I think the "logit_scale" is adapted from another codebase and causing the confusion, I think according to the below code, it is updated to the initial value each round. btw, love your profile picture

logit_scale = self.clip.logit_scale.exp()

haha, COYG! So can I just understand that logit scale does not work in training and testing because it is always 1, and I can remove it.

Yes, I think so. COYG