salesforce / EDICT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reconstruction Error

zwx8981 opened this issue · comments

Very nice work! I have a question, while EDICT is invertible by design, why the reconstruction error is not zero as shown in Table 1?

commented

Thank you! The reconstruction error from Table 1 is pixel-level so the VAE encoding/decoding process introduces some level of error (which is why the LDM VAE and 'EDICT` columns are all equal). EDICT is exact at the latent level (up to floating point precision, but that's too small to register in the table) but since the pixel measurement involves the VAE it inherits that level of error. Does that make sense?

Thanks for the reply, that makes sense! Is it possible to modify the VAE to fix the error?

Thanks for the reply, that makes sense! Is it possible to modify the VAE to fix the error?

I also want to know how to reduce the error further by modifying the VAE. Can anyone provide some suggestions?

Thanks for the reply, that makes sense! Is it possible to modify the VAE to fix the error?

I also want to know how to reduce the error further by modifying the VAE. Can anyone provide some suggestions?

The VAE is trained separately from the diffusion model. If you were to modify the VAE, it would make the outputs inconsistent with the diffusion model.

So, to improve the reconstruction error from VAE you would have to train both the VAE and diffusion model from scratch.

This is exactly what was done in latest Stable Diffusion versions. The latest versions of Stable Diffusion have improved their VAE, see the table below from SDXL paper.

image