The Impact of MLP Depth

Question

The Impact of MLP Depth

Robertwyq opened this issue 3 months ago · comments

Hello, thanks for your wonderful work. I think it very valuable that good results can be achieved using only a small denoising decoder. I noticed that the training depth provided in the code is 3, and the article also mentions 3, but the checkpoint used for testing employs a deeper MLP. I'm curious about the impact of depth on the training performance of the model. What effects does increasing the MLP depth have on the model's performance during training?

Tianhong Li · Answer 1 · Wed Jul 31 2024 17:56:56 GMT+0800 (China Standard Time)

Thanks for your interest! Increasing MLP depth has a very similar effect as increasing MLP width (both increase the number of parameters in the MLP). We use the default setting for all ablations, and use a larger MLP just for comparison with the state-of-the-art. After depth=3 and width=1024, the gain from a larger MLP becomes quite marginal (0.1 or 0.2 FID).

王宇琪 · Answer 2 · Wed Jul 31 2024 22:55:50 GMT+0800 (China Standard Time)

Thanks for your reply