Value loss function selection
SarunasSS opened this issue · comments
Sarunas Simaitis commented
Regarding your a3c_agent.py
# Compute losses, more details in https://arxiv.org/abs/1602.01783
# Policy loss and value loss
action_log_prob = self.valid_spatial_action * spatial_action_log_prob + non_spatial_action_log_prob
advantage = tf.stop_gradient(self.value_target - self.value)
policy_loss = - tf.reduce_mean(action_log_prob * advantage)
value_loss = - tf.reduce_mean(self.value * advantage)
Shouldnt the value loss just be MSE? ie
value_loss = tf.reduce_sum( tf.square( self.value_target - self.value ) )
Also why do you use the mean instead of sum?