Anyone reproduced the MountainCarContinuous-v0 results?

Question

Anyone reproduced the MountainCarContinuous-v0 results?

QiXuanWang opened this issue 5 years ago · comments

I tried with same setting and the final stable average reward is close to 0, instead of 100. Is anyone tried this implementation doing getting expected values?

QiXuanWang · Answer 1 · Tue May 07 2019 15:04:06 GMT+0800 (China Standard Time)

I do have some better results now after some rerun and tuning but still can't get the same results as author which is quite stable. My run begins to diverge after being stable for a while

QiXuanWang · Answer 2 · Fri May 10 2019 16:33:16 GMT+0800 (China Standard Time)

Got consistent good results with ou_sigma=0.52, validate_episodes=200

Zhenting (Alex) Zhao · Answer 3 · Tue Sep 15 2020 20:02:46 GMT+0800 (China Standard Time)

I found that if you set ou_sigma too large, the reward will converge to -100. So, I think a good idea is to start from ou_sigma = 0.50, increase it, and if you see rewards converge to -100, decrease it, repeat (like doing a binary search). Using this method, I was able to find ou_sigma that gives stable result for any given seed.