lliuz / ARFlow

The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the recommended test_shape value?

NagabhushanSN95 opened this issue · comments

I'm trying to apply ARFlow on Sintel database, which has images of shape (436, 1024). What is the recommended test shape value?

For the example given, images are of shape (375, 1242), but the used test shape is (384,640). I didn't understand how you got that value

My bad. You've mentioned 448x1024 for Sintel. But still, I would like to know if there is a heuristic behind choosing this size. Like for eg, what test_shape would choose for frames of shape (240,320) or (1080,1920)?

Dear @NagabhushanSN95, thanks for pointing it out, It helped me to let the network run on Sintel.
My guess is that the input shape should be diadic (divisible by 2) so to be sure the filters span over the entire frames.

Not just 2. Shape should be divisible by 32. But i dont see a pattern in how test_shape is chosen

yes sorry you are right, it should not be just two. To be honest I don't know the exact minimum divider. Anyway the number makes sense for Sintel. the original size 436,1024 is not divisibile by 32 without remainder, whereas 448,1024 is divisible by 32 without remainder.

If you look at other repositories, similar values have been used, e.g. https://github.com/princeton-vl/RAFT

I hope this helps

Oh! Okay. Thanks. But I'm planning to use ARflow on UCF-101 dataset, whose resolution is 320x240. I'm wondering if 320x256 is a good value for test_shape or should it be something else

I am not the author of this paper, but in my view it should be ok. However if you use the pretrained model you should be careful...If I am not mistaken, I have found that the pretrained models present a very high epe #35

Yes. With pre-trained models, I got the best reconstruction error when using test_shape (448, 1024) only. But when training, I don't see a point of blowing up (240,320) frames to (448, 1024). But I've also read at some places that blowing up helps. So, I wanted to know if the authors have some intuition or heuristic for selecting the test_shape