Reinforcement Learning Algorithms Zoo

RLzoo is a collection of most practical reinforcement learning algorithms, frameworks and applications. It is implemented with Tensorflow 2.0 and API of neural network layers in TensorLayer 2, to provide a hands-on fast-developing approach for reinforcement learning practices. It supports basic toy-tests like OpenAI Gym and DeepMind Control Suite with very simple configurations. Moreover, RLzoo supports large-scale distributed training framework for more realistic scenarios with Unity 3D, Mujoco, Bullet Physics, and robotic learning tasks with Vrep/Pyrep, etc.

Contents
- Algorithms
- Applications
Prerequisites
Usage
Troubleshooting
Citing

Please note that this repository using RL algorithms with high-level API. So if you want to get familiar with each algorithm more quickly, please look at our RL tutorials where each algorithm is implemented individually in a more straightforward manner.

Status: Work-in-Progress:

Currently the repository is still in development, and there may be some envrionments incompatible with our algorithms. If you find any problems or have any suggestions, feel free to contact with us!

Algorithms	Action Space	Tutorial Env	Papers
value-based
Q-learning	Discrete	FrozenLake	Technical note: Q-learning. Watkins et al. 1992
Deep Q-Network (DQN)	Discrete	FrozenLake	Human-level control through deep reinforcement learning, Mnih et al. 2015.
Prioritized Experience Replay	Discrete	Pong, CartPole	Schaul et al. Prioritized experience replay. Schaul et al. 2015.
Dueling DQN	Discrete	Pong, CartPole	Dueling network architectures for deep reinforcement learning. Wang et al. 2015.
Double DQN	Discrete	Pong, CartPole	Deep reinforcement learning with double q-learning. Van et al. 2016.
Retrace	Discrete	Pong, CartPole	Safe and efficient off-policy reinforcement learning. Munos et al. 2016:
Noisy DQN	Discrete	Pong, CartPole	Noisy networks for exploration. Fortunato et al. 2017.
Distributed DQN (C51)	Discrete	Pong, CartPole	A distributional perspective on reinforcement learning. Bellemare et al. 2017.
policy-based
REINFORCE(PG)	Discrete/Continuous	CartPole	Reinforcement learning: An introduction. Sutton et al. 2011.
Trust Region Policy Optimization (TRPO)	Discrete/Continuous	Pendulum	Abbeel et al. Trust region policy optimization. Schulman et al.2015.
Proximal Policy Optimization (PPO)	Discrete/Continuous	Pendulum	Proximal policy optimization algorithms. Schulman et al. 2017.
Distributed Proximal Policy Optimization (DPPO)	Discrete/Continuous	Pendulum	Emergence of locomotion behaviours in rich environments. Heess et al. 2017.
actor-critic
Actor-Critic (AC)	Discrete/Continuous	CartPole	Actor-critic algorithms. Konda er al. 2000.
Asynchronous Advantage Actor-Critic (A3C)	Discrete/Continuous	BipedalWalker	Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.
DDPG	Discrete/Continuous	Pendulum	Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016
TD3	Discrete/Continuous	Pendulum	Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.
Soft Actor-Critic (SAC)	Discrete/Continuous	Pendulum	Soft actor-critic algorithms and applications. Haarnoja et al. 2018.

Applications:

Prerequisites:

python 3.5
tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0
tensorlayer >= 2.0.1
tensorflow-probability
tf-nightly-2.0-preview

pip install -r requirements.txt

Usage:

python3 main.py --env=Pendulum-v0 --algorithm=td3 --train_episodes=600 --mode=train
python3 main.py --env=BipedalWalker-v2 --algorithm=a3c --train_episodes=600 --mode=train --number_workers=2
python3 main.py --env=CartPole-v0 --algorithm=ac --train_episodes=600 --mode=train
python3 main.py --env=FrozenLake-v0 --algorithm=dqn --train_episodes=6000 --mode=train

Troubleshooting:

If you meet the errorAttributeError: module 'tensorflow' has no attribute 'contrib' when running the code after installing tensorflow-probability, try: pip install --upgrade tf-nightly-2.0-preview tfp-nightly

ExtremeMart / RLzoo

Reinforcement Learning Algorithms Zoo

Status: Work-in-Progress:

Contents:

Algorithms:

Applications:

Prerequisites:

Usage:

Troubleshooting:

Citing:

About

Languages