miccunifi / ladi-vton

This is the official repository for the paper "LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On". ACM Multimedia 2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VAE with intermediate features takes up more GPU memory than original VAE

Ganzhi-e opened this issue · comments

commented

Hi, your work is so wonderful! Here is some questions.

I noticed that declaring val_pipe in the training code as an instance of StableDiffusionTryOnePipeline will occupy a very large amount of GPU memory, and inference.py itself will also occupy a large amount of GPU memory when running. It would be much better to replace VAE with intermediate features with original VAE. Have you noticed it? May I ask what GPU you are running on when running inference.py?

Thank you!

facing same issue... any suggesstions how to solve it?

commented

facing same issue... any suggesstions how to solve it?

In order to solve the problem of losing details of faces and hands, you can use the repaint method in other works, i.e.
x_result = x_generation* mask_tensor + (1.0-mask_tensor) * x_source, where x_generation is the generated try-on result, x_source is the original person image and mask_tensor is the inpaint mask.

In addition, directly using the inference code with emasc will occupy about 70G GPU memory in NVIDIA A100.

Hope these information will be helpful to you.

@Ganzhi-e Thank you for suggesting it.
We actually tried the stitching method you proposed in the previous comment in the past.

The main problem with the stitching you proposed is that if the keypoint constraint doesn't work as expected (even if the final pose differs from the initial one by a few pixels), the inpainted region will seem copy-pasted on the final image.

This is the reason why we come up with the EMASC module.

I think the stitching choiche is a trade of between image quality and computational constraints.
Olso depends on the data you want to apply the method to.

All the training have been performed on a A100.