ghliu / pytorch-ddpg

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Anyone reproduced the MountainCarContinuous-v0 results?

QiXuanWang opened this issue · comments

I tried with same setting and the final stable average reward is close to 0, instead of 100. Is anyone tried this implementation doing getting expected values?

I do have some better results now after some rerun and tuning but still can't get the same results as author which is quite stable. My run begins to diverge after being stable for a while

Got consistent good results with ou_sigma=0.52, validate_episodes=200

I found that if you set ou_sigma too large, the reward will converge to -100. So, I think a good idea is to start from ou_sigma = 0.50, increase it, and if you see rewards converge to -100, decrease it, repeat (like doing a binary search). Using this method, I was able to find ou_sigma that gives stable result for any given seed.