andrefmsmith / amsRL_openAIgym

Reinforcement Learning: Solutions and explorations of openAIgym tasks with walkthroughs for study purpose. Part of my learning for Udacity Deep Reinforcement Learning nanodegree.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A 'Code Blog' for Deep Reinforcement Learning using openAI gym

I created this repo as a study aid for Deep Reinforcement Learning, following my enrollment in the corresponding Udacity Nanodegree.

My goal is to write a series of Jupyter Notebooks as if they were blog entries, each containing code for resolving a particular task in the openAI gym environments, detailed explanations of how the code works and short, 'take-home' notes on the underlying Reinforcement Learning concepts.

The process of documenting and explaining my code as if it were going to be read by others has been instrumental in structuring and clarifying my thinking. Furthermore, I have tried to use the notebooks to briefly recapitulate the underlying RL concepts and remind myself of the context and motivation for a particular approach, rather than focusing exclusively on implementation. I feel this exercise has promoted in me a deeper understanding of RL concepts and their context, in forcing me to consider a big-picture view of how everything I've learnt so far fits together. If you're enrolled on the same (or a similar) program, I would encourage you to avoid passive consumption of course content (regardless of its high quality) and adopt this proactive approach. Though slower, my experience has been that it is ultimately more rewarding in that it leads to deeper and more consolidated understanding.

See below for a table of contents of environments and concepts tackled. I will keep a running update on this.

Environment Concepts Date
FrozenLake TD methods (Q-learning) 21/02/2020
Blackjack Monte Carlo Control 23/02/2020
LunarLander/Box2D in general Deep Q Networks 25/02/2020
Unity Banana Deep Q Networks, with detailed walkthrough 03/03/2020
Cartpole Policy Networks, Hill-Climbing, Steepest Ascent 10/03/20
MountainCar 2-layer Policy Networks, Cross-Entropy Method 13/03/20
Pong! from pixels Policy gradients, future rewards, REINFORCE 24/03/2020
Reacher Policy gradients, Experience Replay, DDPG 01/04/2020
Tennis Multi-agent RL, Policy Gradients, Experience Replay, DDPG 07/04/2020

About

Reinforcement Learning: Solutions and explorations of openAIgym tasks with walkthroughs for study purpose. Part of my learning for Udacity Deep Reinforcement Learning nanodegree.

License:MIT License


Languages

Language:Jupyter Notebook 99.4%Language:Python 0.6%