Score difference in ITM and ITC ?
Kapil-23 opened this issue · comments
Hello,
I am currently in the process of evaluating the Blip2 model for one of my use cases, where I need to assess the similarity between text and images. For the initial round of experiments, I utilized the image text matching notebook.
Below are the results for the Image-Text Coherence (ITC) and Image-Text Matching (ITM):
Input Text:
In this image, a person is depicted wearing a white and black t-shirt and black socks. The individual is standing on a green surface, with a white wall in the background.
ITM Score: The image and text are matched with a probability of 99.878%.
ITC Score: The cosine similarity between the image feature and text feature is 0.4622.
Why is there a significant difference in the ITM and ITC scores?