How to train from scratch or fine-tune with multiply scenes datasets?

Question

How to train from scratch or fine-tune with multiply scenes datasets?

xdiyer opened this issue 3 years ago · comments

I want to train the model with my own dataset from scratch, and also want to fine-tune the model with 2 different scenes datasets as you did in paper about scene editing. Could you tell me how to modify the train.py or release the training code from scratch?Looking forward to your reply.Thank you.

Artem Sevastopolsky · Answer 1 · Thu Jul 08 2021 22:05:06 GMT+0800 (China Standard Time)

Hi @xdiyer, for training from scratch you can just follow the guidelines in the Fitting descriptors section of the readme. You will only need to modify the configs for your scene (train_example.yaml, paths_example.yaml, scene.yaml). Similarly, for training the network on 2 scenes, you can specify both of them in the paths config under the main section datasets. The example would be:

datasets:
    "my_scene_1":
         scene_path: scene_1.yaml
         target_path: images_undistorted_1
         target_name_func: "lambda i: f'{i}.png'"
    "my_scene_2":
         scene_path: scene_2.yaml
         target_path: images_undistorted_2
         target_name_func: "lambda i: f'{i}.png'"

You can also look at other issues where training on the new scenes is described (e.g. the nice example is here).

xdiyer · Answer 2 · Fri Jul 09 2021 15:06:46 GMT+0800 (China Standard Time)

thank u @seva100, In terms of fine tune the network on 2 scenes, is it necessary to change the image resolution of the two scenes to the same? For these two videos come from different cameras

xdiyer · Answer 3 · Fri Jul 09 2021 16:41:19 GMT+0800 (China Standard Time)

We train a single scene with 1920x1080 resolution on a single 1080i card, batch size is 4. But when training two scenes with the same resolution, bsize can only be set to 1. What is the reason and how to enlarge bsize ?

Artem Sevastopolsky · Answer 4 · Fri Jul 09 2021 20:23:23 GMT+0800 (China Standard Time)

I think the image resolution for the two scenes can be different.

What is the error you're getting when setting the larger batch size for two scenes?

xdiyer · Answer 5 · Mon Jul 12 2021 20:30:59 GMT+0800 (China Standard Time)

cuda out of memory error
with single scene, we can train using dataloader_workers=4, max b_size =1 or dataloader_workers=1, max b_size =4 ,
Did the multi-scenes model you trained, such as 41 people in paper, have any memory problems before?

Artem Sevastopolsky · Answer 6 · Mon Jul 12 2021 20:48:22 GMT+0800 (China Standard Time)

Cuda out of memory error makes sense here since with more scenes you will have a larger memory footprint (you need to store the descriptors of several scenes). So decreasing the batch size and use dataloader_workers=1 should help if the amount of available memory of your card is insufficient.

Perhaps you can also try reducing the --crop_size -- this will reduce the amount of GPU memory used.

@alievk can you please also comment in case I'm missing something?

xdiyer · Answer 7 · Wed Jul 14 2021 10:21:54 GMT+0800 (China Standard Time)

--crop_size works, thanks.
High resolution image（1080p or higher） must be used when building point cloud (low resolution image will lead to very low quality of point cloud). considering GPU memory and training speed, I hope to reduce the image resolution when training the model. but I find that if the image resolution used before and after is inconsistent, the model will fail. Is there any suggestion about this?

Artem Sevastopolsky · Answer 8 · Thu Jul 15 2021 00:24:52 GMT+0800 (China Standard Time)

I'm not sure I understood, in which way you'd like to reduce the resolution. Do you want to reduce the resolution of ground truth images during the network training? Because in this case, the result will also be more blurry. Though I think --crop_size would help you here as well. Did I get it right?

xdiyer · Answer 9 · Thu Jul 22 2021 12:08:52 GMT+0800 (China Standard Time)

Yes, reducing the resolution of the original ground truth images during training will not only blur but also make the model completely unavailable. Now training with crop size can work, but it taks too long. I will study this issue in detail later. Thank you again for your suggestion

Artem Sevastopolsky · Answer 10 · Thu Jul 22 2021 20:36:43 GMT+0800 (China Standard Time)

You can try increasing batch size with training with lower --crop_size or use several GPU if you have any. The code should support distributed training.

xdiyer · Answer 11 · Mon Jul 26 2021 20:41:54 GMT+0800 (China Standard Time)

@seva100,I have trained a model with two scenes, both of which can be rendered using the network. In order to reproduce the scene editing, I manually aligned two point clouds in meshlab and saved them as a new point cloud. How to calculate the coordinate transformation matrix of the point cloud to facilitate the alignment of point descriptors, and how to adjust two camera parameters to render two scenes at the same time (as demonstrated by scene editing). I don't know much about point cloud and 3D, hope u can give some detailed guidance or sample code. Thanks again.

Artem Sevastopolsky · Answer 12 · Tue Jul 27 2021 18:18:22 GMT+0800 (China Standard Time)

You can use the descriptors trained for the second point cloud while using its new vertex positions that you've manually aligned. Camera positions can be thus aligned with the first point cloud. In this case, you don't need to retrain anything -- you can just use the model trained with the two scenes.