A question about training on weather conditions

Question

A question about training on weather conditions

hsleiman1 opened this issue 10 months ago · comments

Hello,

I ran a training from clear to snowy on BDD100K. After epoch 15, the loss is almost stable. Is this normal?

Killian Steunou · Answer 1 · Thu Jul 13 2023 16:18:22 GMT+0800 (China Standard Time)

Hi @hsleiman1, in order for us to help you, could you please provide the full command line you used and any relevant information to replicate your problem (python version, which os are you on and which version, pytorch version, gpu type...)?
Thank you!

Emmanuel Benazera · Answer 2 · Thu Jul 13 2023 17:54:55 GMT+0800 (China Standard Time)

Also try without multimodal first.

hsleiman1 · Answer 3 · Thu Jul 13 2023 18:16:16 GMT+0800 (China Standard Time)

Hello,

The training command is as follows:

python train.py --dataroot datasets/clear2snowy --checkpoints_dir checkpoints --name clear2snowy --output_display_env clear2snowy --output_display_freq 50 --output_print_freq 50 --train_G_lr 0.0002 --train_D_lr 0.0001 --data_crop_size 512 --data_load_size 512 --data_dataset_mode unaligned_labeled_mask_online --model_type cut --train_batch_size 3 --train_iter_size 4 --model_input_nc 3 --model_output_nc 3 --f_s_net segformer --f_s_config_segformer models/configs/segformer/segformer_config_b0.py --train_mask_f_s_B --f_s_semantic_nclasses 11 --G_netG segformer_attn_conv --G_config_segformer models/configs/segformer/segformer_config_b0.json --data_online_creation_crop_size_A 512 --data_online_creation_crop_delta_A 64 --data_online_creation_mask_delta_A 64 --data_online_creation_crop_size_B 512 --data_online_creation_crop_delta_B 64 --dataaug_D_noise 0.01 --data_online_creation_mask_delta_B 64 --alg_cut_nce_idt --train_sem_use_label_B --D_netDs projected_d basic vision_aided --D_proj_interp 512 --D_proj_network_type vitsmall --train_G_ema --G_padding_type reflect --train_optim adam --dataaug_no_rotate --train_sem_idt --model_multimodal --train_mm_nz 16 --G_netE resnet_512 --f_s_class_weights 1 10 10 1 5 5 10 10 30 50 50 --output_display_aim_server 127.0.0.1 --output_display_visdom_port 8501 --gpu_id 0,1,2,3

I am using 4 nvidia L4 and torch==2.0.1.

Is this information sufficient?

hsleiman1 · Answer 4 · Thu Jul 13 2023 18:16:58 GMT+0800 (China Standard Time)

Also try without multimodal first.

Could you please give more details on this? or a link?

Thanks!

hsleiman1 · Answer 5 · Thu Jul 20 2023 16:38:35 GMT+0800 (China Standard Time)

Hi, I have tried by removing the --model_multimodal option, these are the current results. Is this better in your opinion? Should I continue the train? Thanks!

Emmanuel Benazera · Answer 6 · Fri Jul 21 2023 17:03:38 GMT+0800 (China Standard Time)

@hsleiman1 you are missing the --train_semantic_mask option, thus the semantic network is not trained. You can see it on the visdom since there's no f_s loss.

Emmanuel Benazera · Answer 7 · Fri Jul 21 2023 17:07:01 GMT+0800 (China Standard Time)

Additionally, it is --f_s_config_segformer models/configs/segformer/segformer_config_b0.json and not .py.

hsleiman1 · Answer 8 · Fri Jul 21 2023 19:50:25 GMT+0800 (China Standard Time)

Thank you, I will run with the following configuration and check:

python train.py --dataroot datasets/clear2snowy --checkpoints_dir checkpoints2 --name clear2snowy2 --output_display_env clear2snowy2 --output_display_freq 50 --output_print_freq 50 --train_G_lr 0.0002 --train_D_lr 0.0001 --data_crop_size 512 --data_load_size 512 --data_dataset_mode unaligned_labeled_mask_online --model_type cut --train_batch_size 3 --train_iter_size 4 --model_input_nc 3 --model_output_nc 3 --f_s_net segformer --f_s_config_segformer models/configs/segformer/segformer_config_b0.json --train_semantic_mask --train_mask_f_s_B --f_s_semantic_nclasses 11 --G_netG segformer_attn_conv --G_config_segformer models/configs/segformer/segformer_config_b0.json --data_online_creation_crop_size_A 512 --data_online_creation_crop_delta_A 64 --data_online_creation_mask_delta_A 64 --data_online_creation_crop_size_B 512 --data_online_creation_crop_delta_B 64 --dataaug_D_noise 0.01 --data_online_creation_mask_delta_B 64 --alg_cut_nce_idt --train_sem_use_label_B --D_netDs projected_d basic vision_aided --D_proj_interp 512 --D_proj_network_type vitsmall --train_G_ema --G_padding_type reflect --train_optim adam --dataaug_no_rotate --train_sem_idt --train_mm_nz 16 --G_netE resnet_512 --f_s_class_weights 1 10 10 1 5 5 10 10 30 50 50 --output_display_aim_server 127.0.0.1 --output_display_visdom_port 8501 --gpu_id 0,1,2,3

Emmanuel Benazera · Answer 9 · Mon Jul 24 2023 23:10:35 GMT+0800 (China Standard Time)

@hsleiman1 FYI I've tested 3 configurations on 3 runs and they all work for me, i.e. clear2snowy goes as expected, from a visual inspection viewpoint that is.
Tested configurations include using the sam discriminator in addition to all others.

hsleiman1 · Answer 10 · Tue Aug 01 2023 16:43:55 GMT+0800 (China Standard Time)

Hello, the results after 65 epochs are as follows, please give me your feedback:

We can see in the following examples that the results are worst than the beginning of the training process:

Thank you!

Emmanuel Benazera · Answer 11 · Tue Aug 01 2023 17:39:37 GMT+0800 (China Standard Time)

This is not enough information to understand what is happening. You need to look at mask conservation, every D loss, etc... The last image seems almost impossible: G moving to clear weather to content the discriminator whereas it's much easier to do so while remaining in night mode. This may point to a dataset issue, orverfit or something else. Never seen this on bdd100k.

I've put my recent run here: https://www.joligen.com/stuff/bdd100k/test_clear2snowy_0723.tar
You can compare to yours, from options, model inferences, etc... It seems fine after 12 epochs.

hsleiman1 · Answer 12 · Wed Aug 09 2023 17:29:30 GMT+0800 (China Standard Time)

Hello,

Thank you for your help. The training looks better now. Here is an example after 37 epochs.

The results look better. What algorithm did you use to enhance the resolution of the created images?

Emmanuel Benazera · Answer 13 · Thu Aug 10 2023 15:18:04 GMT+0800 (China Standard Time)

You can use the generator for inference on the full size images directly. The generator is either fully convolutional (resnet, mobilenet, unet) or directly integrate an upsampling step (segformer).