ajabri / videowalk

Repository for "Space-Time Correspondence as a Contrastive Random Walk" (NeurIPS 2020)

Home Page:http://ajabri.github.io/videowalk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Label propagation: predictions before context has burned in

vadimkantorov opened this issue · comments

@ajabri Could you please explain how results are filled for first n_context = 20 frames? Are they copied from ground truth? The paper suggests that the ground truth is only used for the 1st frame, but I can't find where predictions for 2nd-20th frames are filled in. Are they filled in as background?

From what I could see, predictions affect lbls only after n_context frames https://github.com/ajabri/videowalk/blob/0834ff9/code/test.py#L144-L148:

if t > 0:
    lbls[t + n_context] = pred
else:
    pred = lbls[0]
    lbls[t + n_context] = pred

For DAVIS evaluation, the frames are saved at index t and not t + n_context https://github.com/ajabri/videowalk/blob/0834ff9/code/test.py#L168:

outpath = os.path.join(args.save_path, str(vid_idx) + '_' + str(t))

Are these 2nd-20th frames included in error metric evaluation? and what prections are used for these frames?

Thanks, @ajabri !

Hi @vadimkantorov, thanks for the question. Sorry the code is a bit confusing.

lbsl should have T + n_context label maps. The first n_context are the first frame's labels, copied. This is just to make the implementation simpler. As we propagate labels, we put the predicted label maps back into lbls, to satisfy the recurrence.

Only the last T label maps are dumped to file. So whereas the file path is saved at index t, the data that is dumped, named pred, is actually lbls[t + n_context].

Does this make sense?

Ah, I see! So VOSDataset would insert the first frame copied 20 times in the queue, right?

Yes