Deep Reinforcement Learning (DRL) Algorithms with PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms. This implementation uses PyTorch. For a TensorFlow implementation of algorithms, take a look at tsallis_actor_critic_mujoco.

Algorithms Implemented

Deep Q-Network (DQN) _{^{(V. Mnih et al. 2015)}}
Double DQN (DDQN) _{^{(H. Van Hasselt et al. 2015)}}
Advantage Actor Critic (A2C)
Vanilla Policy Gradient (VPG)
Natural Policy Gradient (NPG) _{^{(S. Kakade et al. 2002)}}
Trust Region Policy Optimization (TRPO) _{^{(J. Schulman et al. 2015)}}
Proximal Policy Optimization (PPO) _{^{(J. Schulman et al. 2017)}}
Deep Deterministic Policy Gradient (DDPG) _{^{(T. Lillicrap et al. 2015)}}
Twin Delayed DDPG (TD3) _{^{(S. Fujimoto et al. 2018)}}
Soft Actor-Critic (SAC) _{^{(T. Haarnoja et al. 2018)}}
Automating entropy adjustment on SAC (ASAC) _{^{(T. Haarnoja et al. 2018)}}
Tsallis Actor-Critic (TAC) _{^{(K. Lee et al. 2019)}}
Automating entropy adjustment on TAC (ATAC)

Environments Implemented

CartPole-v1 _{^{(as described in here)}}
Pendulum-v0 _{^{(as described in here)}}
MuJoCo environments (HalfCheetah-v2, Ant-v2, Humanoid-v2, etc.) _{^{(as described in here)}}

Results

CartPole-v1

Observation space: 4
Action space: 2

Pendulum-v0

Observation space: 3
Action space: 1

HalfCheetah-v2

Observation space: 17
Action space: 6

Ant-v2

Observation space: 111
Action space: 8

Humanoid-v2

Observation space: 376
Action space: 17

Requirements

Usage

The repository's high-level structure is:

├── agents                    
    └── common 
├── results  
    ├── data 
    └── graphs        
├── tests
    └── save_model

1) To train the agents on the environments

To train all the different agents on MuJoCo environments, follow these steps:

git clone https://github.com/dongminlee94/deep_rl.git
cd deep_rl
python run_mujoco.py

For other environments, change the last line to run_cartpole.py, run_pendulum.py.

If you want to change configurations of the agents, follow this step:

python run_mujoco.py \
    --env=Humanoid-v2 \
    --algo=atac \
    --seed=0 \
    --iterations=200 \
    --steps_per_iter=5000 \
    --max_step=1000

2) To watch the learned agents on the above environments

To watch all the learned agents on MuJoCo environments, follow these steps:

cd tests
python mujoco_test.py --load=envname_algoname_...

You should copy the saved model name in tests/save_model/envname_algoname_... and paste the copied name in envname_algoname_.... So the saved model will be load.

rogergranada / deep_rl