ThyrixYang / es_dfm

Implementation and experimental comparison of ES-DFM (Yang et al. 2021), Delayed feedback model(DFM, Chapelle 2014), Feedback Shift Importance Weighting (FSIW) (Yasui et al. 2020), Fake Negative Weighted (FNW) (Ktena et al. 2019) and Fake Negative calibration(FNC) (Ktena et al. 2019)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the loss of ES-DFM

tzjjq279 opened this issue · comments

Thanks for your work and the detailed code.

I have two questions about the loss function of ES-DFM.

  • How to understand the stable_log1pex(x) in the loss.py? It seems to be used to compute p(x) but return log 1 + log(1 + e^{-x}) when x < 0 and log (e^{-x}) + log(1 + e^x) when x >= 0, which seems to be not consistent with the paper?
  • I thought the IS loss function should be attached with a negative sign, but I haven't seen any related implementation (neither eq 17 in your paper or the delay_tn_importance_weight_loss in your source code).

Are there some problems?

well, I finally figure out the problem about stable_log1pex(x), but the IS loss (eq 17) needs to be signed as negative?

Hi, @tzjjq279

Thank you for your interest in our work.

  1. The log1pex function implements log1pex(x) = log(1 + exp(-x)), which is trivial when x >= 0. When x < 0, -tf.minimum(x, 0) + tf.math.log(1+tf.math.exp(-tf.abs(x))) = -x + log(1 + exp(x)), notice that x = log(exp(x)), so -x + log(1 + exp(x)) = -log(exp(x)) + log(1 + exp(x)) = log(((1 + exp(x))/exp(x)) = log(1 + exp(-x)). The motivation of this implementation is that when x << 0, exp(-x) will overflow, which may broken the numerical calculation of log(1 + exp(-x)). For a detailed introduction of such trick, I suggest to refer to this artical https://cran.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf
  2. Thanks for point out this typo, we will update the arxiv paper. The implementation is the nagatived log_sigmoid likelihood, for example, pos_loss = stable_log1pex(x) = log(1 + exp(-x)) = -log(1/(1+exp(-x)) = -log(sigmoid(x))