label smoothing mistake

Question

label smoothing mistake

youngsheen opened this issue a year ago · comments

when compute label smoothing loss, the logit_loss only multiply weight_t while miss the 1/(t+1).

Lin Zheng · Answer 1 · Mon Apr 10 2023 17:26:08 GMT+0800 (China Standard Time)

Hi, thanks for your interest!

Technically, both 1/(t+1) and weight_t are only associated with the diffusion ELBO objective but not the label smoothing loss. Therefore, it is reasonable to use arbitrary weighting for the label smoothing loss (which is often used as an auxiliary objective for regularization) to scale its effect; we conducted various ablations in our preliminary experiments and found that only multiplying label smoothing loss with weight_t yields the best performance for translation tasks.

However, it could be true that this choice may not be optimal in all cases and that carefully tuning the weighting in a task-specific manner may lead to better performance.

Hope this clears things up xD