salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Score difference in ITM and ITC ?

Kapil-23 opened this issue · comments

Hello,

I am currently in the process of evaluating the Blip2 model for one of my use cases, where I need to assess the similarity between text and images. For the initial round of experiments, I utilized the image text matching notebook.

Below are the results for the Image-Text Coherence (ITC) and Image-Text Matching (ITM):

Inputs:
Input Image:
frames_1m6n9j0_

Input Text:
In this image, a person is depicted wearing a white and black t-shirt and black socks. The individual is standing on a green surface, with a white wall in the background.

ITM Score: The image and text are matched with a probability of 99.878%.
ITC Score: The cosine similarity between the image feature and text feature is 0.4622.

Why is there a significant difference in the ITM and ITC scores?