xiaowei-hu / pysc2-agents

This is a simple implementation of DeepMind's PySC2 RL agents.

Home Page:https://zhuanlan.zhihu.com/p/29246185?group_id=890682069733232640

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Value loss function selection

SarunasSS opened this issue · comments

Regarding your a3c_agent.py

# Compute losses, more details in https://arxiv.org/abs/1602.01783
# Policy loss and value loss
action_log_prob = self.valid_spatial_action * spatial_action_log_prob + non_spatial_action_log_prob
advantage = tf.stop_gradient(self.value_target - self.value)
policy_loss = - tf.reduce_mean(action_log_prob * advantage)
value_loss = - tf.reduce_mean(self.value * advantage)

Shouldnt the value loss just be MSE? ie
value_loss = tf.reduce_sum( tf.square( self.value_target - self.value ) )
Also why do you use the mean instead of sum?