showlab / UniVTG

Line 501 in d29baa7

    
           model_inputs["timestamp"] = ( (torch.arange(0, ctx_l) + self.clip_len / 2) / ctx_l).unsqueeze(1).repeat(1, 2)

Greetings.
Shouldn't 0.5 rather than clip_len/2 be used here?
According to my understanding, we need to calculate the center timestamp of each clip here. And ctx_l is the number of clips in this video. So to get the center of each clip, shouldn't we always use 0.5 , since torch.arange(0, ctx_l) always represents a list of intergers like (0, N)?
Something like this:
model_inputs["timestamp"] = ( (torch.arange(0, ctx_l) + 0.5) / ctx_l).unsqueeze(1).repeat(1, 2)

@QinghongLin Hi，I'm really sorry to bother, but do u have any thoughts?

@simon-zy Sorry for late reply, I am busying with the DDL. I will check this and update you after Nov. 17 :), thank you!

@QinghongLin Sorry, but do you have time to check this out now?

@simon-zy Thanks for the heads up and follow up!
Your concerns are make-sense and I think the updated version is correct.
I agree that the ratio should be independent of clip_len.

In my implementation, with the learning scheme, the model is try to learn the difference based on the initial assign timestamp. Thus model will try to learn the start/end difference from the non-exactly mid-point.
But the final windows should be the valid, since we do not require the left and right difference is equal.
But an ideal implementation should be what you proposed. Thanks for pointing out this :)

About timestamp calculation in DatasetMR