Pre-training the flow estimation network

Question

Pre-training the flow estimation network

YaoooLiang opened this issue 6 years ago · comments

Hi, @anchen1011 . I pre-trained the flownet on the Sintel dataset but that does not converge . The batchsize is 16 and learning rate is 0.0001, the loss is defined by calculating the l1 difference between the last sub-net's output and the ground truth. Can you share the details about pre-training the flownet?

Andrew · Answer 1 · Wed Aug 01 2018 02:12:52 GMT+0800 (China Standard Time)

I think there should be no problem with batchsize 16 and learning rate 0.0001 setup.
Would you like to share the visualized input/output/target/flow so that I can have a sense on what's preventing the network from converge?

james · Answer 2 · Wed Aug 01 2018 10:50:24 GMT+0800 (China Standard Time)

@anchen1011 ,thank you for your reply. Images are normalized between 0 and 1 image = image.astype(np.float32) / 255, image = image[0:432, :, :] while flows are unpreprocessed. The shape of images is [16, 432, 1024, 3] and the shape of flow is [16, 432, 1024, 2]. The downsampleed 8x images and flow0 flow0 = tf.constant(np.zeros((16, 54, 128, 2)), np.float32) are concatenated tf.concat([frame1, frame2, flow0], axis=3) as the first sub_net's input, and the rest of the subnets are as similar inputs as this way.

james · Answer 3 · Wed Aug 01 2018 12:01:10 GMT+0800 (China Standard Time)

Loss drops when training only on one batch but trains losss up and down on the entire data set.

Andrew · Answer 4 · Thu Aug 02 2018 03:47:57 GMT+0800 (China Standard Time)

It seems like you are implementing the pertaining pipeline with TF, which could introduce many issues that are unknown to me.

I think in general to figure out the reason why it doesn't converge, you need to:

Visualize the network architecture (with tensorboard)
Visualize a few groups of input/output/target/flow images

I would be happy to help if you attach these images so that I can take a look.

Also, your preprocessing of images is quiet different from ours. We use defaultTrainTransform from
this module

james · Answer 5 · Wed Aug 08 2018 21:08:04 GMT+0800 (China Standard Time)

@anchen1011 ,Hi, sorry for the late reply. I visiualize a few groups of the images in
images.zip. After a long period of training, training l1-loss stabilized around 0.1 and validation l1-loss is around 0.15 . But the model also had a bad performance both in training datasets and validation datasets. Can you share the details about:

1.Which way do you choose for training? end2end training or step by step ?
2.How to normalize optical flow data?
3.Input images'size is original size or be cropped to a smaller size?

Andrew · Answer 6 · Thu Aug 09 2018 01:58:26 GMT+0800 (China Standard Time)

I think your network is learning something, which means the input/output format are good.

However, the network structure seems problematic. Each subnet should output a optical flow, which you need to both resize and double the magnitude.

For your 3 questions:

First step by step, and then fine-tune with end2end. Only step by step should be able to deliver a very good result.
You don't normalize optical flow data.
Input image size is cropped into the network input size (if not the same).

james · Answer 7 · Tue Aug 14 2018 10:22:02 GMT+0800 (China Standard Time)

Hi,@anchen1011 . Actually, I resize and double each subnet's output flow at the same time :flow2 = tf.image.resize_images(flow1, (flow1.shape[1] * 2, flow1.shape[2] * 2)) * 2 Then, I train each subnet one by one but failed again.Also, I check that the input images and target flow are matched.Would you give me any suggestions? Thank you a lot!