Gumbel Distribution and Derivability
mm1212345 opened this issue · comments
mm1212345 commented
Hey there!
I am currently working my way through the action sampling process from a categorical variable. In order to get from the logits to the probabilities as accurately as possible, the Gumbel noise is added to the logits. This is the reason for the double log. Correct?
But still, the action is choosen with tf.argmax(self.logits - tf.log(-tf.log(u)), axis=-1)
. Isn't it the case that still the argmax
operation results in the whole sampling process not being derivable?
What else do I not understand?