Max margin loss function for ER-MLP seems wrong
kiranramnath007 opened this issue · comments
Hi there! First of all thank you for the detailed programming exercises you have provided here.
I am trying to implement ER-MLP in Ex.6 with the max-margin loss function. The variables y_pos and y_neg used there seem a little confusing.
M = y_pos.size(0)
y_pos = y_pos.view(-1).repeat(C) # repeat to match y_neg
y_neg = y_neg.view(-1)
# target = [-1, -1, ..., -1], i.e. y_neg should be higher than y_pos
target = -np.ones(M*C, dtype=np.float32)
loss = F.margin_ranking_loss(
y_pos, y_neg, target, margin=margin, size_average=average
)
`
- Shouldn't y_neg contain the scores derived for the sampled negatives from the network? The code mentions they are binary-valued containing the true labels, but that doesn't seem right to me
- After the repeating, y_pos becomes M*Cx1 sized tensor, while y_neg is still Mx1 size. This may cause issues when passing to the loss function
- The variable target is initialized as -1, i.e. it would learn to rank y_neg higher, but don't we want the network to rank y_pos higher than y_neg?
Thanks in advance!
Hi,
For margin loss we didn't add proper example as this was particularly implemented for energy based scoring function like transE and related methods and not for ERMLP score. But can be simply adapted for it too. Hope this will answer your questions:
- You're right y_pos and y_neg are meant to be scores of positive and negative samples. Margin loss can also be used by only scoring positive samples and setting hard label to -1 for unknown negatives.
- y_neg is expected to be of shape MxC. Read the comment:
y_pos = y_pos.view(-1).repeat(C) **# repeat to match y_neg** - Yes negative will get higher score. In energy based scoring function, minimizing scores is same as maximizing energy. This is why negatives have higher energy than positives. It just depend on how scoring function is defined. So for ERMLP you can simply change sign of target.
Thanks for your reply.
Regarding 2, yes I figured that would be the case. But the input specification should be corrected then
y_neg: np.array of size Mx1 (binary) Contains the true labels.
Regarding 3, just to confirm, for ER-MLP, will you then have targets = [1,1,1,....] (M*C x 1 sized)? Because we want to say that seen positive samples should have higher probability of being true, than non-existent ones.
yeah I saw the other comment was misleading. Corrected now.
it depends on the scoring function. If the scores are probabilities then indeed targets should be [1,1,1,...]. In some other case like energy based function targets can also be negative
Hope this answers your query.
Yes that's helpful. Thank you!