yandexdataschool / Practical_RL

A course in reinforcement learning in the wild

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pseudocode for "better" policy evaluation in CEM

dniku opened this issue · comments

The end of the notebook suggests evaluating the policy in a "theoretically better" way by sampling an initial action for each initial state uniformly and then playing with the current policy until the end. A user on Coursera forum reports that pseudocode would make the idea clearer.