Focal loss in OWL-ViT
sargun-nagpal opened this issue · comments
Hi, I was going through the loss script for OWL-ViT and wanted to confirm the implementation of the focal loss for training/fine-tuning the model.
From the focal loss paper,
When y = 1:
When y = 0:
However, in the implementation, I see that the cost is computed as:
This is not the same as the formula above. Can someone please explain why we are calculating the loss this way, or if I am misunderstanding something?
@sargun-nagpal Did you notice *=
(Multiply AND) next to neg_cost_loss
as well as pos_cost_loss
?
@hvgazula Yes, I did. That just calculates the following:
pos_cost_class
neg_cost_class
Therefore,
pos_cost_class - neg_cost_class
This is in contrast to the focal loss formula (mentioned above), where we make use of the ground truth label y
to choose one of pos_cost_loss
or neg_cost_loss
terms to calculate the loss:
Hello! Sorry for being unclear earlier. In fact, you derived the answer yourself 😉 . All you need to tell yourself is- In the equation from the article, t
is the ground truth, and (in binary classification) it has two possibilities pos
class and neg
class. Now write down the cost for each sample (based on whether t = pos
or t = neg
) and you have the equation in your comment.
In other words- Imagine you have 2 samples (1 positive [t = pos
] and 1 negative [t = neg
]). Write down the cost for the positive sample as well as the negative sample and those are the two terms in your derivation.
more succintly FL(all samples) = FL(pos samples) + FL(neg samples) ...
pos samples
itself means y = 1
. So, y * FL(pos samples)
again is redundant.
Regarding why FL (all samples) = FL (pos samples) - FL(neg samples)
, Section 2.1 from this paper as pointed in
scenic/scenic/projects/owl_vit/losses.py
Line 21 in 1963df7