elisiojsj / Reinforcement-Learning

Reinforcement learning algorithms on Open-AI Gym environments.

Repository from Github https://github.comelisiojsj/Reinforcement-LearningRepository from Github https://github.comelisiojsj/Reinforcement-Learning

Reinforcement Learning

This is a series of models I worked on while studying Reinforcement Learning.

Resource study:

Value-Based

This is based on the classic DeepMind Paper which has a very popular video of the Breakout game. This video was my biggest inspiration to study A.I.

The model employs a DQN algorithm with 2 Convolutional Neural Networks (Fixed-Q Learning) to approximate a value function to learn to play the game. It also utilize the Experience Replay technique.

The method used was TD (Temporal-Difference) Learning. A trained agent can be found in the folder agents.

VB_breakout

Implementation of Deep Q Network to solve CartPole environment with a simple Neural Network, TD Learning, Q-target and Experience Replay.

Q Learning implementation to solve CartPole environment with a simple Neural Network. A trained agent can be found in the folder agents.

Q-learning-cartpole

Q Learning implementation to solve MountainCar environment with a simple Neural Network. A trained agent can be found in the folder agents.

A tabular classical Q-Learning was used to implement the Taxi-v3 environment. As it's a simpler and discrete game the classical model could easily solve it.

An interesting approach was used to make it possible to apply a classical tabular Q-Learning on a more complex continuous environment like MountainCar using numpy functions of digitize and linspace. Furthermore, it counts with the implementation of two control methods for TD which are SARSA and Q-Learning.

tabql-cartpole


Policy-Based

Implementation of REINFORCE, a Monte Carlo policy-gradient algorithm in the LunarLander-v2. A trained agent can be found in the folder agents.

Implementation of REINFORCE, a Monte Carlo policy-gradient algorithm in the Acrobot-v1.

Acrobot


Actor-Critic

This is an implementation of the interesting Actor-Critic algorithm which is roughly a way to take the best of the techniques of Policy-Based and Value-Based approaches. This one was implemented in the LunarLander-v2 environment. A trained agent can be found in the folder agents. And recordings can be found in the folder recordings.

AC_lunarlander


Extra

Playing a Gym Atari game in the jupyter notebook

Software version

  • conda 4.8.2
  • python 3.7.4
  • ptorch 1.4.0
  • open-ai gym 0.17.1

About

Reinforcement learning algorithms on Open-AI Gym environments.


Languages

Language:Jupyter Notebook 98.0%Language:Python 2.0%