rddy / ggrl

RL experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ggrl

Reinforcement learning experiments

UDRL

Learn to upside-down reinforcement learn using a Neural Process. Key idea: instead of conditioning the action predictor on the return, condition on a set of examples (state, action, return) with lower returns.

PVN

Policy evaluation networks with a few improvements:

  • Instead of learning probing states, use sampled states
  • Instead of concatenating a fixed number of state-action embeddings together to form a fingerprint, mean-pool an arbitrary number of embeddings
  • To prevent the PVN from learning to predict return from the state distribution alone while ignoring actions, shuffle actions in each batch and set the return prediction target to the minimum return for the environment
  • Improve the fingerprint representation by using it to reconstruct actions, in addition to predicting returns
  • Regularize policy optimization by minimizing reconstruction error, in addition to maximizing predicted return

About

RL experiments

License:MIT License


Languages

Language:Python 56.6%Language:Jupyter Notebook 43.4%