The result of tracking point from the middle of video is not precise

Question

The result of tracking point from the middle of video is not precise

ernestchu opened this issue a year ago · comments

Hi, thanks for your great work. When I tried you notebook demo. There's some ambiguities when tracking manually selected points.

queries = torch.tensor([
    [0., 400., 350.],  # point tracked from the first frame
    [10., 600., 500.], # frame number 10
    [20., 750., 600.], # ...
    [30., 900., 200.]
])

Let's say we are interesting in queries[1], which is the index to a point in the 10th frame, so the model should output a trajectory of all (0, 0) and visibility of False from 0 to 9 timestamps. However, when inspecting pred_visibility, the expected behavior only presents at the first four timestamps. (same problem also happens to pred_tracks)

pred_visibility[:, :, 1]

tensor([[False, False, False, False,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         False, False, False,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True]], device='cuda:0')

Why is that? Thanks!

Nikita Karaev · Answer 1 · Thu Sep 14 2023 18:16:10 GMT+0800 (China Standard Time)

Hi @ernestchu, thank you for your question!
The model works with sliding windows. As soon as the frame of interest (in this case, the 10th frame) falls within a particular sliding window, the model begins providing visibility predictions for that point throughout the entire window. The sliding window has a size of 8 frames with an overlap of 4 frames, so the frame number 10 falls within the second sliding window. This explains why the visibility is set to "False" only for the first four timestamps in this case (the same is true for trajectories). You can simply discard these predictions if you don't need them.

Ernie Chu · Answer 2 · Thu Sep 14 2023 18:30:41 GMT+0800 (China Standard Time)

Thanks for your detailed response!