anchen1011 / toflow

TOFlow: Video Enhancement with Task-Oriented Flow

Home Page:http://toflow.csail.mit.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pre-training the flow estimation network

YaoooLiang opened this issue · comments

commented

Hi, @anchen1011 . I pre-trained the flownet on the Sintel dataset but that does not converge . The batchsize is 16 and learning rate is 0.0001, the loss is defined by calculating the l1 difference between the last sub-net's output and the ground truth. Can you share the details about pre-training the flownet?

I think there should be no problem with batchsize 16 and learning rate 0.0001 setup.
Would you like to share the visualized input/output/target/flow so that I can have a sense on what's preventing the network from converge?

commented

@anchen1011 ,thank you for your reply. Images are normalized between 0 and 1 image = image.astype(np.float32) / 255, image = image[0:432, :, :] while flows are unpreprocessed. The shape of images is [16, 432, 1024, 3] and the shape of flow is [16, 432, 1024, 2]. The downsampleed 8x images and flow0 flow0 = tf.constant(np.zeros((16, 54, 128, 2)), np.float32) are concatenated tf.concat([frame1, frame2, flow0], axis=3) as the first sub_net's input, and the rest of the subnets are as similar inputs as this way.

commented

Loss drops when training only on one batch but trains losss up and down on the entire data set.

It seems like you are implementing the pertaining pipeline with TF, which could introduce many issues that are unknown to me.

I think in general to figure out the reason why it doesn't converge, you need to:

  1. Visualize the network architecture (with tensorboard)
  2. Visualize a few groups of input/output/target/flow images

I would be happy to help if you attach these images so that I can take a look.

Also, your preprocessing of images is quiet different from ours. We use defaultTrainTransform from
this module

commented

@anchen1011 ,Hi, sorry for the late reply. I visiualize a few groups of the images in
images.zip. After a long period of training, training l1-loss stabilized around 0.1 and validation l1-loss is around 0.15 . But the model also had a bad performance both in training datasets and validation datasets. Can you share the details about:

1.Which way do you choose for training? end2end training or step by step ?
2.How to normalize optical flow data?
3.Input images'size is original size or be cropped to a smaller size?

I think your network is learning something, which means the input/output format are good.

However, the network structure seems problematic. Each subnet should output a optical flow, which you need to both resize and double the magnitude.

For your 3 questions:

  1. First step by step, and then fine-tune with end2end. Only step by step should be able to deliver a very good result.
  2. You don't normalize optical flow data.
  3. Input image size is cropped into the network input size (if not the same).
commented

Hi,@anchen1011 . Actually, I resize and double each subnet's output flow at the same time :flow2 = tf.image.resize_images(flow1, (flow1.shape[1] * 2, flow1.shape[2] * 2)) * 2 Then, I train each subnet one by one but failed again.Also, I check that the input images and target flow are matched.Would you give me any suggestions? Thank you a lot!