lewisKit / rllabsharp

TRPO with Stein Control Variates

Home Page:https://arxiv.org/abs/1710.11198

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rllabsharp

Rllabsharp contains the code of our paper for TRPO experiments based on rllab++.

The codes are experimental, and may require tuning or modifications to reach the best reported performances.

Installation

Please follow the basic installation instructions in rllab documentation.

Dependency

  • Python 3.5
  • Anaconda
  • rllab
  • MuJoCo

Running Experiments

We provide three different Phi structures(linear, quadratic, mlp) and two different Phi optimization methods(FitQ and MinVar) in the repository. Optional flags are defined in launcher_utils.py and here are some running examples:

cd sandbox/rocky/tf/launchers
# Hopper-v1 with linear Phi and FitQ optimization
python algo_gym_stub.py --env_name Hopper-v1 --algo_name cfpo --pf_cls linear --use_gradient_vr False --pf_learning_rate 1e-4 --pf_iters 400

# Hopper-v1 with Linear Phi and MinVar optimization
python algo_gym_stub.py --env_name Hopper-v1 --algo_name cfpo --pf_cls linear --use_gradient_vr True --pf_learning_rate 1e-3 --pf_iters 800

# Hopper-v1 baseline qprop
python algo_gym_stub.py --env_name Hopper-v1 --algo_name qprop --qprop_eta_option=adapt1

Citations

If you find this repository helpful, please cite following papers:

Feedbacks

Currently the code is a little messy, we will clean it and make it easier for test soon. If you have any questions about the code or the paper, please feel free to contact Yihao Feng or Hao Liu.

About

TRPO with Stein Control Variates

https://arxiv.org/abs/1710.11198

License:Other


Languages

Language:Python 89.3%Language:Jupyter Notebook 7.1%Language:JavaScript 1.3%Language:HTML 0.7%Language:Ruby 0.6%Language:Shell 0.4%Language:CSS 0.4%Language:Mako 0.2%