yandexdataschool / Practical_RL

A course in reinforcement learning in the wild

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Equation of state-action value function in seminar_vi week 02

AI-Ahmed opened this issue · comments

Hello there,
First, I have to thank you so much for providing us with such an amazing curriculum. Second, I want to assign a note here that took me some time to search about it. Now, first I want to clarify some points regards value function.

  • Value Function: "How good is" specific action or specific state for your agent. (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, Prof. Steve Brunton: Model-Based Reinforcement Learning)

  • The value (utility) of a state s: V*(s) = expected utility starting in s and acting optimally while The value (utility) of a q-state (s, a): Q*(s, a) = expected utility starting out having taken action a from state s and (thereafter) acting optimally (Prof. Pieter Abbeel, lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 22).

  • When we talk about Value Iteration, there are two value functions to that; (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, John Schulman: Markov Decision Processes and Solving Finite Problems, slide 11, Prof. Pieter Abbeel: lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 50).

    1. state-value function:
      $v_\pi(s) = E(G_t | s_t=s]$ <---- It gives the value of a "state" under $\pi$.
    2. state-action value function:
      $Q_\pi(s, a) = E(G_t | S_t=s, A_t=a)$ <---- How is is good for an agent to take any given action ($a$) from a given state ($s$) while following the policy ($\pi$).

Therefore, we have 2 value iteration formulas (state-value, and state-action value). I didn't see a mix between them before honestly as I saw in the seminar_vi.ipynb. How would you expect me to get get_action_value without having a 2d list of actions and states by invoking $V(s')$ into the equation?

So, if I want to get the Value Iteration in the case of the seminar_vi.ipynb, the equation should look like that;
$V_i(s) = \sum_s' P(s' |s,a) \dot [r_{i+1} + \gamma V_i(s')$ instead of $Q_i(s, a) = \sum_{s'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma V_{i}(s')]$

and if I have state-action-table (values) (a.k.a Q-table), I can use state-value function to calculate it.
$$Q_i(s, a) = \sum_{s', a'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma Q_{i}(s', a')]$$

Please, If I'm wrong correct me, I would be more than happy to hear your thoughts.

For posterity: the equations you posted were invalid because they did not specify where you got the action $a$ from.

  1. $V_i(s) = \sum_s' P(s' |s,a) \dot [r_{i+1} + \gamma V_i(s')$: here the $a$ in $P(s' |s,a)$ is undefined, and it has to come from somewhere.
  2. $Q_i(s, a) = \sum_{s', a'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma Q_{i}(s', a')]$: similar problem here, but $a'$ is unspecified.

These are actually substantial issues: once you fix them by saying $a$ is a parameter or something you sum over, you come to the mixed usage of $V$ and $Q$ we advertise in seminar_vi.ipynb.

hello @dniku , This was an old question when It was the first time me learning RL. Don't worry about it, as you can see I have already closed it!