optimal number of time steps required for sampling and training

Question

optimal number of time steps required for sampling and training

jeanRassaire opened this issue 5 months ago · comments

Hello,

I would like to know what impact the number of timesteps has on the learning process, the sampling process and the performance of the model.

In addition, I have seen that in the sampling process, the model is set to to evaluation mode, i.e. model.eval(), is there a reason for this?

mobaidoctor · Answer 1 · Thu Apr 25 2024 18:30:28 GMT+0800 (China Standard Time)

Hi @jeanRassaire, thank you for your inquiry. Apologies for the delayed response. Due to an ongoing intensive project, we are fully occupied with daily tasks and unable to address other matters at this time. This situation may continue until the end of June. We will do our best to respond promptly here, but please allow 3-5 days for replies to future inquiries. Thank you for your understanding.

Regarding your question about timesteps, they play a crucial role in how diverse the synthesized samples are through the model. More timesteps typically result in more diversity, but they also slow down the sampling process, making it costlier. Therefore, it's essential to find a balance between the number of timesteps and the total number of images in your training dataset. In our experiments, which focused on semantic image synthesis, we aimed for cost-effectiveness. Starting with 250 timesteps yielded acceptable results. We also tested 1000 and 500 timesteps, noting slight differences in image overall quality; more timesteps led to more diverse sampling. Although all results were acceptable, we chose to use 250 timesteps for efficiency but recommend 1000 for optimal diversity. We didn't include these detailed experiments and results in our manuscript due to space limitations, but you can find more information on the efficacy of timesteps in the original denoising diffusion probabilistic model paper available at https://arxiv.org/pdf/2006.11239.pdf.
For your second question regarding model.eval(), it is not strictly necessary in our current setup as the dropout rate is set to 0. However, our model code includes a dropout layer, giving users the option to activate it if they wish to experiment with their own dataset. During training, dropout layers randomly deactivate a fraction of input units to prevent overfitting by simulating a reduced network and encouraging independent feature learning by the active units. In contrast, during inference or testing, you would typically want to utilize all units in the network. By applying model.eval(), you deactivate the dropout effect, ensuring all neurons are active for computation. Failing to use model.eval() during inference means dropout is still active, potentially leading to inconsistent and inaccurate results as the model behaves as though it is still in training mode. This is unsuitable for testing on new data where consistent performance is crucial. Thus, while not required in our default setup, model.eval() is essential if any form of dropout is used during experimentation.

Jean-Rassaire · Answer 2 · Thu Apr 25 2024 21:43:17 GMT+0800 (China Standard Time)

Thank you @mobaidoctor , for your reply. yes it solves my problems.