Is it possible to track specified point in the video from the webcam?

Question

Is it possible to track specified point in the video from the webcam?

newforrestgump001 opened this issue 5 months ago · comments

newforrestgump001 commented 5 months ago

Is it possible to track the point without putting all the frames as a batch. Thank you a lot.

Nikita Karaev · Answer 1 · Tue Apr 16 2024 03:44:32 GMT+0800 (China Standard Time)

Hi @newforrestgump001, sorry for the late response. Yes, you can check out the online demo.

Is this what you are looking for?

newforrestgump001 · Answer 2 · Wed Apr 17 2024 15:48:52 GMT+0800 (China Standard Time)

@nikitakaraevv Sincere thanks for your response. Suppose a video with 100 frames, First, give 5 frames as input of network and then gives output; and the output updates when more frames given to the network frame by frame. In other words, does it support tracked result updating frame by frame.

Nikita Karaev · Answer 3 · Wed Apr 17 2024 18:57:18 GMT+0800 (China Standard Time)

Hi @newforrestgump001, yes, that's how this demo works, except for the result is currently updated every four frames, not every frame.

newforrestgump001 · Answer 4 · Thu Apr 18 2024 16:55:25 GMT+0800 (China Standard Time)

@nikitakaraevv Thanks a lot, I got it.

lutianye · Answer 5 · Thu Apr 18 2024 17:03:06 GMT+0800 (China Standard Time)

Hi @nikitakaraevv Could we achieve real-time inference by conducting the inference process every four frames? In other words, would it be feasible to perform inference on frames one to four, and then repeat the process on frames five to eight, and so forth? Additionally, would it be necessary to reinitialize the model for each set of four frames? Lastly, could you please share any examples of C++ deployment code that utilizes TensorRT? Your response would be greatly appreciated.

Nikita Karaev · Answer 6 · Sun Apr 21 2024 20:25:03 GMT+0800 (China Standard Time)

Hi @lutianye, yes, that's exactly how the online demo works: it initialises the model, waits until it has access to the first 8 frames (the sliding window size is 8 frames, but the step is 4 frames), and then processes 4 new frames at a time while always taking only 8 frames as input at every step.
The model is not reinitialised after every step.

Unfortunately, we don't yet have any examples of C++ deployment, but if you let me know what your use case is, we might be able to help you. You can leave your email address or contact me via nikita@robots.ox.ac.uk, and we can discuss it in more detail.

lutianye · Answer 7 · Wed Apr 24 2024 13:19:18 GMT+0800 (China Standard Time)

Hi @nikitakaraevv Thank you for your prompt response. I understand what you mean. Our objective is to capture images from a camera while simultaneously performing tracking, thus we cannot afford to feed all frames to the model. Instead, we need to capture and input frames in real-time. I am considering developing a C++ demo leveraging TensorRT, however, I am concerned that TensorRT might not support certain operators. Please forgive me for the delay in my response.My email address is ganzhijie@gmail.com.

dat-nguyenvn · Answer 8 · Tue May 21 2024 22:28:48 GMT+0800 (China Standard Time)

Hi, any update for TensorRT? Thanks