Suggestion: Include the possible environment score ranges in the main descriptions

Question

Suggestion: Include the possible environment score ranges in the main descriptions

nhansendev opened this issue 2 years ago · comments

I found the table with the range of possible scores in the appendix of the paper and thought it could be a useful reference to include in a more visible location, such as on the main github page alongside the environment descriptions:

"
C. Normalization Constants
Rmin is computed by training a policy with masked out observations. This demonstrates what score is trivially achievable in each environment. Rmax is computed in several different ways. For CoinRun, Dodgeball, Miner, Jumper, Leaper, Maze, BigFish, Heist, Plunder, Ninja, and Bossfight, the maximal theoretical and practical reward is trivial to compute.

For CaveFlyer, Chaser, and Climber, we empirically determine Rmax by generating many levels and computing the average max achievable reward.

For StarPilot and FruitBot, the max practical reward is not obvious, even though it is easy to establish a theoretical bound. We choose to define Rmax in these environments as the score PPO achieves after 8 billion timesteps when trained at an 8x larger batch size than our default hyperparameters. On observing these policies, we find them very close to optimal.
"

What do you think?