bow loss
raphael-sch opened this issue · comments
Why do you calculate the the bow loss similar to a sequence (tile bow logits to max_out_len and compare it to labels) instead of forming a single one hot vector out of the labels and compare it to the predicted bow vector?
bow_target = tf.one_hot(target, vocab_size)
bow_target = tf.minimum(tf.reduce_sum(bow_target, 1), 1)
bow_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=bow_target, logits=bow_logits)
bow_loss = tf.reduce_mean(tf.reduce_sum(bow_loss, -1))`
this should be a lot faster and serves the same purpose
You can do that too. But the problem is then BOW loss is not in the scale of RNN loss.
Also, in the BOW setting, log(P(w1, w2... wT|z)) = log(P(w1|z)) + log(P(w2|z)) .. + log(P(wT|z))
You can do that too. But the problem is then BOW loss is not in the scale of RNN loss.
Also, in the BOW setting, log(P(w1, w2... wT|z)) = log(P(w1|z)) + log(P(w2|z)) .. + log(P(wT|z))
hi, I can not understand what do you mean in "But the problem is then BOW loss is not in the scale of RNN loss". The code supplied by raphael-sch can get same computing result as your codes. Can you explain it in detail?