ljpzzz / machinelearning

My blogs and code for machine learning. http://cnblogs.com/pinard

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

policy_gradient没有效果

yaoxunji opened this issue · comments

https://github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/policy_gradient.py

直接复制运行
episode: 0 Evaluation Average Reward: 15.0
episode: 100 Evaluation Average Reward: 10.2
episode: 200 Evaluation Average Reward: 9.2
episode: 300 Evaluation Average Reward: 9.3
episode: 400 Evaluation Average Reward: 9.4
episode: 500 Evaluation Average Reward: 9.0
episode: 600 Evaluation Average Reward: 9.6
episode: 700 Evaluation Average Reward: 9.4
episode: 800 Evaluation Average Reward: 9.7
episode: 900 Evaluation Average Reward: 9.5
episode: 1000 Evaluation Average Reward: 9.6
episode: 1100 Evaluation Average Reward: 9.7
episode: 1200 Evaluation Average Reward: 9.1
episode: 1300 Evaluation Average Reward: 9.2
episode: 1400 Evaluation Average Reward: 9.3
episode: 1500 Evaluation Average Reward: 9.3
episode: 1600 Evaluation Average Reward: 9.4
episode: 1700 Evaluation Average Reward: 9.3
episode: 1800 Evaluation Average Reward: 9.4
episode: 1900 Evaluation Average Reward: 8.8
episode: 2000 Evaluation Average Reward: 9.3
episode: 2100 Evaluation Average Reward: 9.4
episode: 2200 Evaluation Average Reward: 9.6
episode: 2300 Evaluation Average Reward: 9.6
episode: 2400 Evaluation Average Reward: 9.3
episode: 2500 Evaluation Average Reward: 9.3
episode: 2600 Evaluation Average Reward: 9.4
episode: 2700 Evaluation Average Reward: 9.7
episode: 2800 Evaluation Average Reward: 9.6
episode: 2900 Evaluation Average Reward: 9.8