Question about the loss of ES-DFM
tzjjq279 opened this issue · comments
tzjjq279 commented
Thanks for your work and the detailed code.
I have two questions about the loss function of ES-DFM.
- How to understand the stable_log1pex(x) in the loss.py? It seems to be used to compute p(x) but return log 1 + log(1 + e^{-x}) when x < 0 and log (e^{-x}) + log(1 + e^x) when x >= 0, which seems to be not consistent with the paper?
- I thought the IS loss function should be attached with a negative sign, but I haven't seen any related implementation (neither eq 17 in your paper or the delay_tn_importance_weight_loss in your source code).
Are there some problems?
tzjjq279 commented
well, I finally figure out the problem about stable_log1pex(x), but the IS loss (eq 17) needs to be signed as negative?
Thyrix commented
Hi, @tzjjq279
Thank you for your interest in our work.
- The log1pex function implements log1pex(x) = log(1 + exp(-x)), which is trivial when x >= 0. When x < 0, -tf.minimum(x, 0) + tf.math.log(1+tf.math.exp(-tf.abs(x))) = -x + log(1 + exp(x)), notice that x = log(exp(x)), so -x + log(1 + exp(x)) = -log(exp(x)) + log(1 + exp(x)) = log(((1 + exp(x))/exp(x)) = log(1 + exp(-x)). The motivation of this implementation is that when x << 0, exp(-x) will overflow, which may broken the numerical calculation of log(1 + exp(-x)). For a detailed introduction of such trick, I suggest to refer to this artical https://cran.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf
- Thanks for point out this typo, we will update the arxiv paper. The implementation is the nagatived log_sigmoid likelihood, for example, pos_loss = stable_log1pex(x) = log(1 + exp(-x)) = -log(1/(1+exp(-x)) = -log(sigmoid(x))