pemami4911 / deep-rl

Collection of Deep Reinforcement Learning algorithms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In tf.gradients, why -self.action_gradien is needed?

GoingMyWay opened this issue · comments

Hi, in the code of ActorNetwork

        self.unnormalized_actor_gradients = tf.gradients(
            self.scaled_out, self.network_params, -self.action_gradient)

Why -self.action_gradient is needed here? grad_ys is -self.action_gradient , but you returned self.unnormalized_actor_gradients .

My understanding is in the paper, the actor policy updating format it

image

and since the J is the expected value and our target is to maximizing the J, so in

self.unnormalized_actor_gradients = tf.gradients(
            self.scaled_out, self.network_params, -self.action_gradient)
        self.actor_gradients = list(map(lambda x: tf.div(x, self.batch_size), self.unnormalized_actor_gradients))

negating self.action_gradient is a good trick to minimizing the J.

In

self.unnormalized_actor_gradients = tf.gradients(
            self.scaled_out, self.network_params, -self.action_gradient)

-self.action_gradient is the weight as in the paper

image

Part A is tf.gradients(self.scaled_out, self.network_params) and part B is-self.action_gradient.