HKUNLP / reparam-discrete-diffusion

Reparameterized Discrete Diffusion Models for Text Generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

label smoothing mistake

youngsheen opened this issue · comments

when compute label smoothing loss, the logit_loss only multiply weight_t while miss the 1/(t+1).

Hi, thanks for your interest!

Technically, both 1/(t+1) and weight_t are only associated with the diffusion ELBO objective but not the label smoothing loss. Therefore, it is reasonable to use arbitrary weighting for the label smoothing loss (which is often used as an auxiliary objective for regularization) to scale its effect; we conducted various ablations in our preliminary experiments and found that only multiplying label smoothing loss with weight_t yields the best performance for translation tasks.

However, it could be true that this choice may not be optimal in all cases and that carefully tuning the weighting in a task-specific manner may lead to better performance.

Hope this clears things up xD