The Impact of MLP Depth
Robertwyq opened this issue · comments
Hello, thanks for your wonderful work. I think it very valuable that good results can be achieved using only a small denoising decoder. I noticed that the training depth provided in the code is 3, and the article also mentions 3, but the checkpoint used for testing employs a deeper MLP. I'm curious about the impact of depth on the training performance of the model. What effects does increasing the MLP depth have on the model's performance during training?
Thanks for your interest! Increasing MLP depth has a very similar effect as increasing MLP width (both increase the number of parameters in the MLP). We use the default setting for all ablations, and use a larger MLP just for comparison with the state-of-the-art. After depth=3 and width=1024, the gain from a larger MLP becomes quite marginal (0.1 or 0.2 FID).
Thanks for your reply