Higher Resolution FlowNet

Question

Higher Resolution FlowNet

noufpy opened this issue 5 years ago · comments

Hello!

So I have successfully computed the optical flow of many videos with your model but I would love to take it to the next level and have super high quality resolution, say videos or images around 4000 pixels in width. I've been getting strange errors when I attempt to do this:

Have you attempted to produce a video of this high resolution? If so, how did you attempt to do so?
and any ideas what this error might mean? It does not seem to be a memory issue..

Machine Specs: Linux (Ubuntu 16.04), Quadro P6000, GPU 24 GB, RAM 30 GB, CPUS 8

Nikolaus Mayer · Answer 1 · Thu Jan 17 2019 04:53:34 GMT+0800 (China Standard Time)

Wow, that's impressive. I think Caffe (or a part of it) is hard limited to INT_MAX index sizes somewhere. INT_MAX is 2,147,483,647, but a layer with 4000 wide inputs and some nontrivial channel depth will hit that pretty quickly.

The biggest I've ever tried was 1920x1080, on a machine with 16GB RAM and 8GB VRAM. It was slow, but I think it worked. However, I don't remember which model I was running (so it may have been a smaller one; and maybe I was also running into your problem with larger models).

I can think of 3 things to try:

Chop your inputs into smaller tiles, process each tile pair separately, and then (cleverly) recombine the results into a big flow field.
Recompile Caffe to use int64 indices and sizes instead of int32 (which is the limit you are running into). I know it's possible but I don't know how plausible this is for our Caffe version. We've certainly never tried it :)
Try our newer TensorFlow release (usage is pretty much the same). It's not exactly the same network as FlowNet2, but should be at least as good. TensorFlow MIGHT not have this restriction (but I've never tried it).

(Please let me note here that common super-high resolution inputs (e.g. from a 4K movie) would probably not produce good outputs in any case. The receptive field of the network is constant and independent of the input size, and it determines the maximum flow distance that can be recovered: If you have a 640x480 sample with a 100 pixel motion, that same input at 4K resolution represents a 625 pixel motion, and that is well outside the capabilities of FlowNet2 in this architecture.)

Nouf A. · Answer 2 · Fri Jan 18 2019 00:11:47 GMT+0800 (China Standard Time)

Thank you for your detailed response! I will look into this and post any results I get 👍

Nikolaus Mayer · Answer 3 · Mon Apr 08 2019 19:01:12 GMT+0800 (China Standard Time)

(closed due to inactivity)