Reconstruction Error

Question

Reconstruction Error

zwx8981 opened this issue a year ago · comments

Very nice work! I have a question, while EDICT is invertible by design, why the reconstruction error is not zero as shown in Table 1?

bram-w · Answer 1 · Tue Aug 15 2023 01:42:54 GMT+0800 (China Standard Time)

Thank you! The reconstruction error from Table 1 is pixel-level so the VAE encoding/decoding process introduces some level of error (which is why the LDM VAE and 'EDICT` columns are all equal). EDICT is exact at the latent level (up to floating point precision, but that's too small to register in the table) but since the pixel measurement involves the VAE it inherits that level of error. Does that make sense?

Weixia Zhang · Answer 2 · Tue Aug 15 2023 10:45:01 GMT+0800 (China Standard Time)

Thanks for the reply, that makes sense! Is it possible to modify the VAE to fix the error?

spiderman · Answer 3 · Sun Dec 24 2023 18:14:03 GMT+0800 (China Standard Time)

Thanks for the reply, that makes sense! Is it possible to modify the VAE to fix the error?

I also want to know how to reduce the error further by modifying the VAE. Can anyone provide some suggestions?

tvaranka · Answer 4 · Tue Feb 06 2024 15:27:33 GMT+0800 (China Standard Time)

Thanks for the reply, that makes sense! Is it possible to modify the VAE to fix the error?

I also want to know how to reduce the error further by modifying the VAE. Can anyone provide some suggestions?

The VAE is trained separately from the diffusion model. If you were to modify the VAE, it would make the outputs inconsistent with the diffusion model.

So, to improve the reconstruction error from VAE you would have to train both the VAE and diffusion model from scratch.

This is exactly what was done in latest Stable Diffusion versions. The latest versions of Stable Diffusion have improved their VAE, see the table below from SDXL paper.