ludobouan / Q-learning-gridworld

Reinforcement learning on gridworld with Q-learning

Q-learning-gridworld

Reinforcement learning on gridworld with Q-learning
Submission to Siraj Raval's Q-learning competition

Improvements over orignal code

Made the code compatible with Python 3
Changed the main loop to a more traditional episode - step structure
Added Eligibilty traces with both TD-lambda and Walkin's algorithm for greedy and epsilon-greedy policies respectfully.
Changed the bot's policy to epsilon-greedy
Logged the episode data to csv file in order to be analysed later in a jupyter notebook with matplotlib, pandas, and seaborn

Comparison

Original (greedy)	Greedy with eligibility traces	Epsilon-Greedy with eligibility traces
Greedy policy, Q values are initialized to 0.1 to induce exploration	Same greedy policy but uses eligibility traces to make learning considerably faster	Uses epsilon-greedy policy and eligibility traces, turns out to be less effective than the greedy policy with traces but that may be due to my non-optimized hyperparemeters
40 episodes to solution	10 episodes to solution	15 episodes to solution
Sub-optimal solution	Sub-optimal solution	Will converge to optimal solution with right hyperparameters

Usage

Run python Learner.py in terminal

Dependencies

Tkinter
Matplotlib
Seaborn

Custom gridworld level

Credits

Siraj Ravel
PhillipeMorere

About

Reinforcement learning on gridworld with Q-learning

Languages

Language:Jupyter Notebook 81.6%Language:Python 18.4%