miccunifi / ladi-vton

This is the official repository for the paper "LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On". ACM Multimedia 2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with training VTO & Inversion Adapter

bertinma opened this issue · comments

Hi,

I'm trying to train all the model with 1024x768 images. I performed to train TPS & EMASC using this shape with some code modifications. Training works well according to metrics and visuals results.

But, it doesn't work at all for Inversion adapter and VTO. Both training produce no loss reduction during training (close to constant using hard smoothing on wandb and very oscillating without smoothing).
Screenshot 2023-09-18 at 18 15 22
I tested also using 512x384 shape and it gives me the same results.
Is it an expected result ?

I'm using default parameters except batch_size = 8 for VTO and batch_size=1 for Inversion adapter on a single A100 GPU. I assume that a greater value than 1 could prevent this training issue, but my HW doesn't allow to use a bigger one 😞
I tried to reduce learning rate but it results to the same issue.

Commands used to train Inversion adapter and VTO :

  • python src/train_inversion_adapter.py --dataset vitonhd --vitonhd_dataroot data/viton-hd/ --output_dir checkpoints/inverter_1024 --gradient_checkpointing --enable_xformers_memory_efficient_attention --use_clip_cloth_features --allow_tf32 --pretrained_model_name_or_path pretrained_models/stable-diffusion-2-inpainting/ --height 1024 --width 768 --train_batch_size 1 --test_batch_size 1

  • python src/train_vto.py --dataset vitonhd --vitonhd_dataroot data/viton-hd/ --output_dir checkpoints/vto_1024 --inversion_adapter_dir checkpoints/inverter_1024/ --gradient_checkpointing --enable_xformers_memory_efficient_attention --use_clip_cloth_features --height 1024 --width 768 --train_batch_size 8 --test_batch_size 8 --allow_tf32

Could you please, help me to resolve this pb ?
Thanks for your clean work btw :)

Hi @bertinma,

Thank you for your interest in our work!

Regarding experiments at a resolution of 1024x768, I must admit that we did not test the training process procedure at that specific resolution, so I may not be able to provide you with precise guidance in that regard.

At a resolution of 512x384, although the loss behavior appeared to be similar, we did notice improvements in the metrics as the training progressed. Did you notice the same improvements in the metrics during training??

Regarding the batch size, did you try to set the --gradient_accumulation_step parameter such that batch_size * gradient_accumulation_parameters is equal to the desired batch size?

Another point: if you want to achieve optimal performance you need to use the flag --train_inversion_adapter during VTO training.

If you have any more questions or need further insights, please feel free to ask.

Best regards,
Alberto