Question about the loss of ES-DFM

Question

tzjjq279 opened this issue 3 years ago · comments

Thanks for your work and the detailed code.

I have two questions about the loss function of ES-DFM.

How to understand the stable_log1pex(x) in the loss.py? It seems to be used to compute p(x) but return log 1 + log(1 + e^{-x}) when x < 0 and log (e^{-x}) + log(1 + e^x) when x >= 0, which seems to be not consistent with the paper?
I thought the IS loss function should be attached with a negative sign, but I haven't seen any related implementation (neither eq 17 in your paper or the delay_tn_importance_weight_loss in your source code).


Are there some problems?

tzjjq279 · Answer 1 · Fri Jul 16 2021 19:35:52 GMT+0800 (China Standard Time)

well, I finally figure out the problem about stable_log1pex(x), but the IS loss (eq 17) needs to be signed as negative?

Thyrix · Answer 2 · Fri Jul 16 2021 19:59:28 GMT+0800 (China Standard Time)

Thank you for your interest in our work.

The log1pex function implements log1pex(x) = log(1 + exp(-x)), which is trivial when x >= 0. When x < 0, -tf.minimum(x, 0) + tf.math.log(1+tf.math.exp(-tf.abs(x))) = -x + log(1 + exp(x)), notice that x = log(exp(x)), so -x + log(1 + exp(x)) = -log(exp(x)) + log(1 + exp(x)) = log(((1 + exp(x))/exp(x)) = log(1 + exp(-x)). The motivation of this implementation is that when x << 0, exp(-x) will overflow, which may broken the numerical calculation of log(1 + exp(-x)). For a detailed introduction of such trick, I suggest to refer to this artical https://cran.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf
Thanks for point out this typo, we will update the arxiv paper. The implementation is the nagatived log_sigmoid likelihood, for example, pos_loss = stable_log1pex(x) = log(1 + exp(-x)) = -log(1/(1+exp(-x)) = -log(sigmoid(x))