christianhidber / easyagents

Reinforcement Learning for Practitioners.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

max value of the episode rewards in the orso book training visualization always >= 0

christianhidber opened this issue · comments

The max value of the episode rewards in the orso book training visualization seems always to be >= 0. Is this correct ?

image

Yes, strangely enough it does not train at all anymore. Also number of steps per episode are stuck at 50. Orso oscillate between two states. Environment has not changed and PPO still works with Line world. Weird, no idea what is going on.

it seems we are just too impatient, when you wait for a few thousand episodes, reward indeed becomes positive and steps go under 20, just as expected.

That's right. I was more concerned about the start:
image
It feels quite unlikely, that in the evaluations at episodes 0, 50 and 100 we always reach a maximum of 0 (the min and average look ok). Consistently. I would expect that at some point we should have a maximum < 0.

Ah, now I understand. This looks fishy, indeed.

solved in v1