max value of the episode rewards in the orso book training visualization always >= 0

Question

max value of the episode rewards in the orso book training visualization always >= 0

christianhidber opened this issue 5 years ago · comments

The max value of the episode rewards in the orso book training visualization seems always to be >= 0. Is this correct ?

Oliver Zeigermann · Answer 1 · Wed Aug 21 2019 19:12:35 GMT+0800 (China Standard Time)

Yes, strangely enough it does not train at all anymore. Also number of steps per episode are stuck at 50. Orso oscillate between two states. Environment has not changed and PPO still works with Line world. Weird, no idea what is going on.

Oliver Zeigermann · Answer 2 · Wed Aug 21 2019 19:22:31 GMT+0800 (China Standard Time)

it seems we are just too impatient, when you wait for a few thousand episodes, reward indeed becomes positive and steps go under 20, just as expected.

Oliver Zeigermann · Answer 3 · Wed Aug 21 2019 19:33:32 GMT+0800 (China Standard Time)

Oliver Zeigermann commented 5 years ago

Christian Hidber · Answer 4 · Wed Aug 21 2019 19:40:55 GMT+0800 (China Standard Time)

That's right. I was more concerned about the start:

It feels quite unlikely, that in the evaluations at episodes 0, 50 and 100 we always reach a maximum of 0 (the min and average look ok). Consistently. I would expect that at some point we should have a maximum < 0.

Oliver Zeigermann · Answer 5 · Wed Aug 21 2019 19:47:20 GMT+0800 (China Standard Time)

Ah, now I understand. This looks fishy, indeed.

Christian Hidber · Answer 6 · Sun Sep 08 2019 19:39:49 GMT+0800 (China Standard Time)

solved in v1