jvmncs / safe-grid-agents

Training (hopefully) safe agents in gridworlds

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spiky CRMDP Roadmap

jvmncs opened this issue · comments

Main road

  • Create toy environments
  • Refactor for use with Gym API (#32)
    • Modify ai_safety_gridworlds_gym to fit our needs (@david-lindner's fork)
    • Improve dependency management #31
    • Switch all code referencing envs to use Gym env
  • Improved tooling for hyperparameter tuning (e.g. Ray)
  • Estimate compute costs and finalize logistics
    • First guess for an upper bound: 1 agent x 4 environments x 3 experiments = 12 sets of hyperparameters to tune x ~30 training runs = 360 runs x 2 hours
  • Do experiments Start with experiments January 11
    • Check if hparams tuned on Solver generalize to Cheater (vice versa too, but less important/rigorous)
  • Investigate corrupt versions of harder environments
    • Maybe bigger / more realistic boat race
    • Maybe a modified Atari env
    • Maybe a modified MuJoCo env
    • Maybe modified BipedalWalker env

Finish experiments February 15

Deadline February 22

Environments:

  • TomatoWateringCRMDP
  • TransitionBoatRaceCRMDP
  • Toy environments
    • corrupt corners (satisfies our assumptions for guaranteed learnability)
    • corrupt path to goal (does not satisfy assumptions for guaranteed learnability)

Experiments per env

  • Baseline (learns corrupt reward)
  • Cheater (learns with access to true reward)
  • Solver (learns intended behavior from corrupt reward)

Optional

  • Generalize PPO #17
  • Improve test coverage #29