max value of the episode rewards in the orso book training visualization always >= 0
christianhidber opened this issue · comments
Yes, strangely enough it does not train at all anymore. Also number of steps per episode are stuck at 50. Orso oscillate between two states. Environment has not changed and PPO still works with Line world. Weird, no idea what is going on.
it seems we are just too impatient, when you wait for a few thousand episodes, reward indeed becomes positive and steps go under 20, just as expected.
Ah, now I understand. This looks fishy, indeed.
solved in v1