linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"

Home Page:https://arxiv.org/abs/2005.00200

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I find a bug in svmr evaluation

hgzjy25 opened this issue · comments

def post_processing_svmr_nms(
        svmr_res, nms_thd=0.6, max_before_nms=1000, max_after_nms=100):
    """
    svmr_res: list(dict), each dict is
        {"desc": str,
         "desc_id": int,
         "predictions": list(sublist)  # each sublist is
            [video_idx (int), st (float), ed(float), score (float)],
            video_idx is the same.
         }
    """
    processed_svmr_res = []
    for e in svmr_res:
        # the predictions are sorted inside the nms func.
        e["predictions"] = temporal_non_maximum_suppression(
            e["predictions"][:max_before_nms],
            nms_threshold=nms_thd)[:max_after_nms]
        processed_svmr_res.append(e)
    return processed_svmr_res
def temporal_non_maximum_suppression(predictions, nms_threshold,
                                     max_after_nms=100):
    """
    Args:
        predictions:
            list(sublist), each sublist is
            [st (float), ed(float), score (float)],
            note larger scores are better and are preserved.
            For metrics that are better when smaller,
            please convert to its negative,
            e.g., convert distance to negative distance.
        nms_threshold: float in [0, 1]
        max_after_nms:
    Returns:
        predictions_after_nms:
        list(sublist),
        each sublist is [st (float), ed(float), score (float)]
    References:
        https://github.com/wzmsltw/BSN-boundary-sensitive-network/blob/7b101fc5978802aa3c95ba5779eb54151c6173c6/Post_processing.py#L42
    """
    if len(predictions) == 1:  # only has one prediction, no need for nms
        return predictions

    predictions = sorted(predictions, key=lambda x: x[2],
                         reverse=True)  # descending order

    tstart = [e[0] for e in predictions]
    tend = [e[1] for e in predictions]
    tscore = [e[2] for e in predictions]
    rstart = []
    rend = []
    rscore = []
    while len(tstart) > 1 and len(rscore) < max_after_nms:  # max 100 after nms
        idx = 1
        while idx < len(tstart):  # compare with every prediction in the list.
            if compute_temporal_iou(
                    [tstart[0], tend[0]],
                    [tstart[idx], tend[idx]]) > nms_threshold:
                # rm highly overlapped lower score entries.
                tstart.pop(idx)
                tend.pop(idx)
                tscore.pop(idx)
            else:
                # move to next
                idx += 1
        rstart.append(tstart.pop(0))
        rend.append(tend.pop(0))
        rscore.append(tscore.pop(0))

    if (len(rscore) < max_after_nms
            and len(tstart) >= 1):  # add the last, possibly empty.
        rstart.append(tstart.pop(0))
        rend.append(tend.pop(0))
        rscore.append(tscore.pop(0))

    predictions_after_nms = [
        [st, ed, s] for s, st, ed in zip(rscore, rstart, rend)]
    return predictions_after_nms

e['predictions'] in post_processing_svmr_nms has four elements [video_idx (int), st (float), ed(float), score (float)] each sublist, and the function temporal_non_maximum_suppression requires argument predictions with three elements [st (float), ed(float), score (float)] each sublist. So tstart, tend, tscore will be video_idx, st, ed, respectively.
I fix it by myself and test the code. But it seems that this problem have no effect on vcmr results?

So the Moment Retrieval results presented in table 5 is wrong, R@1 on TVR dataset should be up to 15%.

Hi @hgzjy25,

Thanks! It seems that you are right. We directly adopted this part of the code from the official release of TVRetrieval without further validation. Sorry about that. We have double confirmed with the original authors of TVR and they have it fixed here: jayleicn/TVRetrieval@aec36bc, which will be reflected in HERO shortly.

VCMR should not be affected, as the results are preprocessed first by

def filter_vcmr_by_nms(all_video_predictions, nms_threshold=0.6,

fixed with cc99e04

Thank you for your quick reply and action. But there still remains a mistake. _video_id = e["predictions"][0] should be _video_id = e["predictions"][0][0], otherwise _video_id will be the first sublist in e["predictions"] instead of the real video idx.