Equation of state-action value function in seminar_vi week 02

Question

Equation of state-action value function in seminar_vi week 02

AI-Ahmed opened this issue 2 years ago · comments

Hello there,
First, I have to thank you so much for providing us with such an amazing curriculum. Second, I want to assign a note here that took me some time to search about it. Now, first I want to clarify some points regards value function.

Value Function: "How good is" specific action or specific state for your agent. (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, Prof. Steve Brunton: Model-Based Reinforcement Learning)
The value (utility) of a state s: V*(s) = expected utility starting in s and acting optimally while The value (utility) of a q-state (s, a): Q*(s, a) = expected utility starting out having taken action a from state s and (thereafter) acting optimally (Prof. Pieter Abbeel, lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 22).
When we talk about Value Iteration, there are two value functions to that; (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, John Schulman: Markov Decision Processes and Solving Finite Problems, slide 11, Prof. Pieter Abbeel: lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 50).
1. state-value function:
  $v_\pi(s) = E(G_t | s_t=s]$ <---- It gives the value of a "state" under $\pi$.
2. state-action value function:
  $Q_\pi(s, a) = E(G_t | S_t=s, A_t=a)$ <---- How is is good for an agent to take any given action ($a$) from a given state ($s$) while following the policy ($\pi$).

Therefore, we have 2 value iteration formulas (state-value, and state-action value). I didn't see a mix between them before honestly as I saw in the seminar_vi.ipynb. How would you expect me to get get_action_value without having a 2d list of actions and states by invoking $V(s')$ into the equation?

So, if I want to get the Value Iteration in the case of the seminar_vi.ipynb, the equation should look like that;
$V_i(s) = \sum_s' P(s' |s,a) \dot [r_{i+1} + \gamma V_i(s')$ instead of $Q_i(s, a) = \sum_{s'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma V_{i}(s')]$

and if I have state-action-table (values) (a.k.a Q-table), I can use state-value function to calculate it.
$$Q_i(s, a) = \sum_{s', a'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma Q_{i}(s', a')]$$

Please, If I'm wrong correct me, I would be more than happy to hear your thoughts.

Dmitry Nikulin · Answer 1 · Mon Dec 05 2022 03:36:43 GMT+0800 (China Standard Time)

For posterity: the equations you posted were invalid because they did not specify where you got the action $a$ from.

$V_i(s) = \sum_s' P(s' |s,a) \dot [r_{i+1} + \gamma V_i(s')$: here the $a$ in $P(s' |s,a)$ is undefined, and it has to come from somewhere.
$Q_i(s, a) = \sum_{s', a'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma Q_{i}(s', a')]$: similar problem here, but $a'$ is unspecified.

These are actually substantial issues: once you fix them by saying $a$ is a parameter or something you sum over, you come to the mixed usage of $V$ and $Q$ we advertise in seminar_vi.ipynb.

Ahmed Nabil · Answer 2 · Mon Dec 05 2022 03:43:26 GMT+0800 (China Standard Time)

hello @dniku , This was an old question when It was the first time me learning RL. Don't worry about it, as you can see I have already closed it!