Ziyang412 / UCoFiA

Thanks for the great work! I wonder when train or evaluate the model, why only multiple the logit_scale on video-sentence score and sentence-frame score but ignore the pixel_word_score? Does this mean that model cares more about the first two and ignore the latter?

UCoFiA/eval_v2t/modules/modeling_ucofia.py

Line 409 in 517f838

    
           video_sentence_logits = logit_scale * torch.matmul(torch.matmul(sentence_output, self.global_mat_weight), torch.matmul(video_output,self.global_mat_weight_1).t() )

Hi, thanks for the question! we do not consider configuring the logit_scale specifically for any scores (initialized by the following code), thanks!

UCoFiA/train/modules/module_clip.py

Line 380 in 517f838

self.logit_scale = nn.Parameter(torch.ones([]))

Sorry, my question is not expressed clearly. What I want to ask is why logit_scale is not multiplied to pixel_word_score, or do you mean that the logit_scale has not been updated?

UCoFiA/train/modules/modeling_ucofia.py

Line 368 in 517f838

pixel_word_score = (sent2frame_logits + video2word_logits) / 2

I think the "logit_scale" is adapted from another codebase and causing the confusion, I think according to the below code, it is updated to the initial value each round. btw, love your profile picture

UCoFiA/train/modules/modeling_ucofia.py

Line 339 in 517f838

logit_scale = self.clip.logit_scale.exp()

haha, COYG! So can I just understand that logit scale does not work in training and testing because it is always 1, and I can remove it.

Yes, I think so. COYG