Rllabsharp contains the code of our paper for TRPO experiments based on rllab++.
The codes are experimental, and may require tuning or modifications to reach the best reported performances.
Please follow the basic installation instructions in rllab documentation.
Python 3.5
Anaconda
rllab
MuJoCo
We provide three different Phi structures(linear, quadratic, mlp) and two different Phi optimization methods(FitQ and MinVar) in the repository. Optional flags are defined in launcher_utils.py and here are some running examples:
cd sandbox/rocky/tf/launchers
# Hopper-v1 with linear Phi and FitQ optimization
python algo_gym_stub.py --env_name Hopper-v1 --algo_name cfpo --pf_cls linear --use_gradient_vr False --pf_learning_rate 1e-4 --pf_iters 400
# Hopper-v1 with Linear Phi and MinVar optimization
python algo_gym_stub.py --env_name Hopper-v1 --algo_name cfpo --pf_cls linear --use_gradient_vr True --pf_learning_rate 1e-3 --pf_iters 800
# Hopper-v1 baseline qprop
python algo_gym_stub.py --env_name Hopper-v1 --algo_name qprop --qprop_eta_option=adapt1
If you find this repository helpful, please cite following papers:
-
Hao Liu*, Yihao Feng*, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu.(*: equal contribution) "Sample-efficient Policy Optimization with Stein Control Variate". arXiv:1710.11198.
-
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine. "Q-Prop: Sample-Efficient Policy Gradient with an Off-Policy Critic" Proceedings of the International Conference on Learning Representations (ICLR), 2017.
-
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel. "Benchmarking Deep Reinforcement Learning for Continuous Control". Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.
Currently the code is a little messy, we will clean it and make it easier for test soon. If you have any questions about the code or the paper, please feel free to contact Yihao Feng or Hao Liu.