I find a bug in svmr evaluation
hgzjy25 opened this issue · comments
def post_processing_svmr_nms(
svmr_res, nms_thd=0.6, max_before_nms=1000, max_after_nms=100):
"""
svmr_res: list(dict), each dict is
{"desc": str,
"desc_id": int,
"predictions": list(sublist) # each sublist is
[video_idx (int), st (float), ed(float), score (float)],
video_idx is the same.
}
"""
processed_svmr_res = []
for e in svmr_res:
# the predictions are sorted inside the nms func.
e["predictions"] = temporal_non_maximum_suppression(
e["predictions"][:max_before_nms],
nms_threshold=nms_thd)[:max_after_nms]
processed_svmr_res.append(e)
return processed_svmr_res
def temporal_non_maximum_suppression(predictions, nms_threshold,
max_after_nms=100):
"""
Args:
predictions:
list(sublist), each sublist is
[st (float), ed(float), score (float)],
note larger scores are better and are preserved.
For metrics that are better when smaller,
please convert to its negative,
e.g., convert distance to negative distance.
nms_threshold: float in [0, 1]
max_after_nms:
Returns:
predictions_after_nms:
list(sublist),
each sublist is [st (float), ed(float), score (float)]
References:
https://github.com/wzmsltw/BSN-boundary-sensitive-network/blob/7b101fc5978802aa3c95ba5779eb54151c6173c6/Post_processing.py#L42
"""
if len(predictions) == 1: # only has one prediction, no need for nms
return predictions
predictions = sorted(predictions, key=lambda x: x[2],
reverse=True) # descending order
tstart = [e[0] for e in predictions]
tend = [e[1] for e in predictions]
tscore = [e[2] for e in predictions]
rstart = []
rend = []
rscore = []
while len(tstart) > 1 and len(rscore) < max_after_nms: # max 100 after nms
idx = 1
while idx < len(tstart): # compare with every prediction in the list.
if compute_temporal_iou(
[tstart[0], tend[0]],
[tstart[idx], tend[idx]]) > nms_threshold:
# rm highly overlapped lower score entries.
tstart.pop(idx)
tend.pop(idx)
tscore.pop(idx)
else:
# move to next
idx += 1
rstart.append(tstart.pop(0))
rend.append(tend.pop(0))
rscore.append(tscore.pop(0))
if (len(rscore) < max_after_nms
and len(tstart) >= 1): # add the last, possibly empty.
rstart.append(tstart.pop(0))
rend.append(tend.pop(0))
rscore.append(tscore.pop(0))
predictions_after_nms = [
[st, ed, s] for s, st, ed in zip(rscore, rstart, rend)]
return predictions_after_nms
e['predictions']
in post_processing_svmr_nms
has four elements [video_idx (int), st (float), ed(float), score (float)]
each sublist, and the function temporal_non_maximum_suppression
requires argument predictions
with three elements [st (float), ed(float), score (float)]
each sublist. So tstart, tend, tscore will be video_idx, st, ed, respectively.
I fix it by myself and test the code. But it seems that this problem have no effect on vcmr results?
So the Moment Retrieval results presented in table 5 is wrong, R@1 on TVR dataset should be up to 15%.
Hi @hgzjy25,
Thanks! It seems that you are right. We directly adopted this part of the code from the official release of TVRetrieval without further validation. Sorry about that. We have double confirmed with the original authors of TVR and they have it fixed here: jayleicn/TVRetrieval@aec36bc, which will be reflected in HERO shortly.
VCMR should not be affected, as the results are preprocessed first by
Line 132 in f938515
Thank you for your quick reply and action. But there still remains a mistake. _video_id = e["predictions"][0]
should be _video_id = e["predictions"][0][0]
, otherwise _video_id
will be the first sublist in e["predictions"]
instead of the real video idx.