REINFORCEMENT LEARNING

This is my attempt at providing clean, "abstraction-free" implementations of various gradient based reinforcement learning algorithms. I have somewhat tried to adopt the "single-file" implementation strategy for each of the algorithms in order to make it easier for anyone who wants to read the code.

The code does not aim to be flexible for different parameter configurations or optimized for solving hard problems and running on multiple GPUs. It is rather a simplified single-process, single-file implementation exposing all the relevant details and removing all the confusing abstractions. Maybe it could be used as a reference if you want to roll out your own implementations.

Implementations of the following algorithms can be found here:

Vanilla policy gradient - code, docs
Advantage Actor-Critic - code docs
Proximal policy optimization - code, docs

If you want to read more about policy gradient algorithms, then checkout a blog post that I wrote.

About

Single-file implementations of reinforcement learning algorithms

deep-reinforcement-learning ppo pytorch

Languages

Language:Python 100.0%