hzxie / RMNet

The official implementation of "Efficient Regional Memory Network for Video Object Segmentation". (Xie et al., CVPR 2021)

Home Page:https://haozhexie.com/project/rmnet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The problem of the precomputed optical flow

cctgem opened this issue · comments

感谢开源。有一个问题想请教。
光流如果是precompute的话,应该是frame by frame的,即frame_step == 1。
当训练到后期的时候会动态增加frame_step,即可能输入第10,20,30帧数据送网络,frame_step==10,对应的光流输入的是10->20,20->30,但是此时实际输入的光流是19->20,29->30。
不确定是否您做了对应的处理我没看到相应的代码,还是我理解错了离线做光流数据的方式。

Yes. You're right.
Actually, we recognize this issue and replace the precomputed optical flow with the real-time computed optical flow by FlowNet-CSS2, the results are almost the same for the DAVIS dataset.
One possible reason is that the motion of the objects is small in the DAVIS dataset and the padding of the bounding box increases the error tolerance of the precomputed optical flow.

Actually, for YouTube-VOS, we don't use the precomputed optical flow and replace it with the real-time computed optical flow by RAFT.

Thanks for your reply. It seems that i need to update the code in train.py if i want to train from scratch.

Why not use FlowNet-CSS2 on YouTube-VOS, but choose RAFT? @hzxie
Thanks.

@MaxChanger
Good question. Because RAFT performs better than FlowNet-CSS2 when estimating the optical flow in YouTube-VOS.

Can I understand it as FlowNet-CSS2 performs better than RAFT on DAVIS?But why?

Yes, you are right. I am also confused about it.

Well, it seems to be consistent with what I understand, and there is no theoretical explanation, only that the experimental effect is better. Very extensive experimental comparison. 👍👍
Thank you once again. @hzxie

Do you plan to release the model train on YouTube vos dataset?@hzxie

@cctgem
Yes.
The model for YouTube-VOS was deleted by accident when I left SenseTime and needed to retrain again.
However, I am busy writing my Ph.D. thesis. Therefore, the model may be available later.