Cuberick-Orion / CIRPLANT

Official implementation of the Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) | ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I want to know about the loss function more definitely

SeolMuah opened this issue · comments

In the paper, there is loss function like "L=log[1+exp(k(φi, ϕ−i,j) - k(φi, ϕ+i))]"
for the loss to be zero, the value of k(φi, ϕ−i,j) must be small and the value of k(φi, ϕ+i) is large.

But I think k(φi, ϕ+i) should be small because it is the l2 distance between prediction and target, and k(φi, ϕ−i,j) should be large because it is the distance between prediction and false image's feature, so the Loss function should be changed as follows.
"L=log[1+exp(k(φi, ϕ+i) - k(φi, ϕ−i,j))]"

I want to know if what I was thinking is correct.

this paper made a mistake, the k means similarity or negative L2 distance.

Yes, the "k" indicates a similarity kernel.