Training (hopefully) safe agents in gridworlds
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
timorl opened this issue 5 years ago · comments
Rewards should be at some point normalized for most agents (although for some this does not matter much). For the PPO agent we should probably normalize the advantages.
closed by #47