Input format for `tracks_for_queries` mode in the model

Question

Input format for `tracks_for_queries` mode in the model

justachetan opened this issue 7 months ago · comments

Hi! Thank you for releasing the code and models publicly! I am trying to use the model to perform inference on my own videos. For visualization, I want to focus on a few selected query points.

The tracks_for_queries mode here seems to be what I need. However, I cannot figure out the required format of the query_points. Could you kindly provide some information about the same?

Thanks!

Guillaume Le Moing · Answer 1 · Fri Dec 29 2023 18:59:58 GMT+0800 (China Standard Time)

Hi, the shape for queries is (B, N, 3) with B the batch size and N the number of points. The three channels are in the format (t, y, x), t is the time step of each query, and (x, y) is the position, given in pixels with the following orientation:

+----------> X
|
|
v
Y

Aditya Chetan · Answer 2 · Fri Jan 05 2024 00:20:44 GMT+0800 (China Standard Time)

Thanks for the quick response! I tried running this by editing the function call in demo.py as follows:

pred = model({"video": video[None], "query_points": torch.Tensor([[[t, y, x]]]).cuda()}, mode="tracks_for_queries", **vars(args))

However I got the following error:

Traceback (most recent call last):
  File "work/dot/demo2.py", line 312, in <module>    main(args)
  File "work/dot/demo2.py", line 304, in main
    data["tracks"] = data["tracks"].permute(0, 2, 1, 3)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 3 is not equal to len(dims) = 4

Could you kindly advise how to fix this? Thanks!

Guillaume Le Moing · Answer 3 · Fri Jan 05 2024 15:44:02 GMT+0800 (China Standard Time)

Hi! You get this error because sparse tracks and dense tracks do not have the same shape:

Dense tracks returned by "tracks_from_first_to_every_other_frame" mode have the shape [B T H W 3] (with B: batch size, T: time steps, H: height, W: width)
Sparse tracks returned by "tracks_for_queries" mode have shape [B T N 3] (with N the number of tracks).

The demo was written to handle dense tracks. You may try to hack it by adding another dimension -> [B T N 1 3].

Guillaume Le Moing · Answer 4 · Thu Feb 01 2024 21:28:30 GMT+0800 (China Standard Time)

The plotting functions in the demo can now handle tracks in both [B T H W 3] and [B T N 3] format. So there is no need for a hack anymore.