Expression for KL divergence
jayantj opened this issue · comments
I was just going through the paper and trying to reproduce the results, and the code has been very helpful so far, thanks a lot for publishing it.
I'm curious about equation (23) from the paper for KL divergence and its implementation in the code -
def kl_div(self, x, y):
"""
Compute sum of D(x_i || y_i) for each corresponding element
along the 3rd dimension (the embedding dimension)
of x and y
This function takes care to not compute logarithms that are close
to 0, since NaN's could result for log(sigmoid(x)) if x is negative.
It simply uses that log(sigmoid(x)) = - log(1 + e^-x)
"""
sig_x = T.nnet.sigmoid(x)
exp_x = T.exp(-x)
exp_y = T.exp(-y)
one_p_exp_x = exp_x + 1
one_p_exp_y = exp_y + 1
return (sig_x * (T.log(one_p_exp_y) - T.log(one_p_exp_x)) + (1 - sig_x) * (T.log(exp_y) - T.log(exp_x))).mean()
I'm a little unsure about some of the terms (especially T.log(exp_y) - T.log(exp_x)
), am I missing a possible simplification? Or was a modified version of the equation used in the final version?
Thanks!
Hi, Yes I see the issue. This is something I was playing with after the
paper was submitted (I was trying different squashing functions to see if
saturation properties made any impact) and messed up when I went back to
put the original sigmoid-KLDiv in for the release. I apologize for the
careless mistake, I'll push a fix in the next few minutes. The following
should be correct:
sig_x = T.nnet.sigmoid(x)
exp_x = T.exp(x)
exp_neg_x = T.exp(-x)
exp_y = T.exp(y)
exp_neg_y = T.exp(-y)
return sig_x * (T.log1p(exp_neg_y) - T.log1p(exp_neg_x)) + (1 - sig_x) *
(T.log1p(exp_y) - T.log1p(exp_x)).mean()
Sure, no problem at all, thanks a lot for the response and fix!