VAE with intermediate features takes up more GPU memory than original VAE

Question

VAE with intermediate features takes up more GPU memory than original VAE

Ganzhi-e opened this issue 8 months ago · comments

Hi, your work is so wonderful! Here is some questions.

I noticed that declaring val_pipe in the training code as an instance of StableDiffusionTryOnePipeline will occupy a very large amount of GPU memory, and inference.py itself will also occupy a large amount of GPU memory when running. It would be much better to replace VAE with intermediate features with original VAE. Have you noticed it? May I ask what GPU you are running on when running inference.py?

Thank you!

Aukture · Answer 1 · Tue Mar 19 2024 20:39:28 GMT+0800 (China Standard Time)

facing same issue... any suggesstions how to solve it?

GunZ · Answer 2 · Wed Apr 03 2024 14:40:09 GMT+0800 (China Standard Time)

facing same issue... any suggesstions how to solve it?

In order to solve the problem of losing details of faces and hands, you can use the repaint method in other works, i.e.
x_result = x_generation* mask_tensor + (1.0-mask_tensor) * x_source, where x_generation is the generated try-on result, x_source is the original person image and mask_tensor is the inpaint mask.

In addition, directly using the inference code with emasc will occupy about 70G GPU memory in NVIDIA A100.

Hope these information will be helpful to you.

Davide Morelli · Answer 3 · Wed Jun 05 2024 16:58:42 GMT+0800 (China Standard Time)

@Ganzhi-e Thank you for suggesting it.
We actually tried the stitching method you proposed in the previous comment in the past.

The main problem with the stitching you proposed is that if the keypoint constraint doesn't work as expected (even if the final pose differs from the initial one by a few pixels), the inpainted region will seem copy-pasted on the final image.

This is the reason why we come up with the EMASC module.

I think the stitching choiche is a trade of between image quality and computational constraints.
Olso depends on the data you want to apply the method to.

All the training have been performed on a A100.