pytorch-a2c-trpo-ppo-acktr
This repo is a fork from Ilya Kostrikov 's repository that adds his old TRPO implementation as well as some bug fixes.
This work was presented as the theoretical final project for CS5100 Introduction to Artificial Intelligence at Northeastern University in Boston, MA.
What is included in this repository:
- Advantage Actor Critic (A2C), a synchronous deterministic version of A3C
- Trust-region Policy Optimization TRPO
- Proximal Policy Optimization PPO
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation ACKTR
- Generative Adversarial Imitation Learning GAIL
Also see the OpenAI posts: A2C/ACKTR and PPO for more information.
This implementation is inspired by the OpenAI baselines for A2C, ACKTR and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.
Please use this bibtex if you want to cite this repository in your publications:
@misc{pytorchrl,
author = {Kostrikov, Ilya and Raayai Ardakani, Matin},
title = {PyTorch Implementations of Reinforcement Learning Algorithms},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/matinraayai/pytorch-a2c-trpo-ppo-acktr-gail}},
}
OpenAI Gym)
Supported (and tested) environments (via- Atari Learning Environment
- MuJoCo
- PyBullet (including Racecar, Minitaur and Kuka)
- DeepMind Control Suite (via dm_control2gym)
I highly recommend PyBullet as a free open source alternative to MuJoCo for continuous control tasks.
All environments are operated using exactly the same Gym interface. See their documentations for a comprehensive list.
To use the DeepMind Control Suite environments, set the flag --env-name dm.<domain_name>.<task_name>
, where
domain_name
and task_name
are the name of a domain (e.g. hopper
) and a task within that domain (e.g. stand
) from
the DeepMind Control Suite. Refer to their repo and their tech report for a full
list of available domains and tasks. Other than setting the task, the API for interacting with the environment is
exactly the same as for all the Gym environments thanks to
dm_control2gym.
Requirements
- Python 3
- PyTorch
- OpenAI baselines
In order to install requirements, follow:
# PyTorch
pip install torch torchvision
# Baselines for Atari preprocessing
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
# Other requirements
pip install -r requirements.txt
Contributions
Please see https://github.com/ikostrikov/pytorch-a2c-trpo-ppo-acktr-gail for more details. I do not plan to maintain this repository actively in the future.
Disclaimer
It's extremely difficult to reproduce results for Reinforcement Learning methods. See "Deep Reinforcement Learning that Matters" for more information. I tried to reproduce OpenAI results as closely as possible. However, majors differences in performance can be caused even by minor differences in TensorFlow and PyTorch libraries.
TODO
- Improve this README file. Rearrange images.
- Improve performance of KFAC, see kfac.py for more information
- Run evaluation for all games and algorithms
Visualization
In order to visualize the results use visualize.ipynb
.
Training
Atari
A2C
python main.py --env-name "PongNoFrameskip-v4"
PPO
python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5
--num-processes 8 --num-steps 128 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01
ACKTR
python main.py --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20
MuJoCo
Please always try to use --use-proper-time-limits
flag. It properly handles partial trajectories
(see https://github.com/sfujim/TD3/blob/master/main.py#L123).
A2C
python main.py --env-name "Reacher-v2" --num-env-steps 1000000
PPO
python main.py --env-name "Reacher-v2" --algo ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1
--lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --gae-lambda 0.95
--num-env-steps 1000000 --use-linear-lr-decay --use-proper-time-limits
ACKTR
ACKTR requires some modifications to be made specifically for MuJoCo. But at the moment, I want to keep this code as unified as possible. Thus, I'm going for better ways to integrate it into the codebase.
Enjoy
Load a pretrained model from my Google Drive.
Also pretrained models for other games are available on request. Send me an email or create an issue, and I will upload it.
Disclaimer: I might have used different hyper-parameters to train these models.
Atari
python enjoy.py --load-dir trained_models/a2c --env-name "PongNoFrameskip-v4"
MuJoCo
python enjoy.py --load-dir trained_models/ppo --env-name "Reacher-v2"