udacity / deep-reinforcement-learning

Repo for the Deep Reinforcement Learning Nanodegree program

Home Page:https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Description of Reacher environment

ZeratuuLL opened this issue · comments

I tried to collect the rewards for every timestep for a random agent and I found that most non-zero rewards are 0.04, I also got some 0.03, 0.02 and 0.01, whose counts are much less than 0.04. But the description says the reward for any timestep should be 0 or 0.1. Are there more details? Thanks!

@ZeratuuLL, I can confirm that. I used Version1 & 2 for Windows (64-bit). But solving the environment is possible anyway.