Reinforcement learning on gridworld with Q-learning
Submission to Siraj Raval's Q-learning competition
- Made the code compatible with Python 3
- Changed the main loop to a more traditional episode - step structure
- Added Eligibilty traces with both TD-lambda and Walkin's algorithm for greedy and epsilon-greedy policies respectfully.
- Changed the bot's policy to epsilon-greedy
- Logged the episode data to csv file in order to be analysed later in a jupyter notebook with matplotlib, pandas, and seaborn
Original (greedy) | Greedy with eligibility traces | Epsilon-Greedy with eligibility traces |
---|---|---|
Greedy policy, Q values are initialized to 0.1 to induce exploration | Same greedy policy but uses eligibility traces to make learning considerably faster | Uses epsilon-greedy policy and eligibility traces, turns out to be less effective than the greedy policy with traces but that may be due to my non-optimized hyperparemeters |
40 episodes to solution | 10 episodes to solution | 15 episodes to solution |
Sub-optimal solution | Sub-optimal solution | Will converge to optimal solution with right hyperparameters |
Run python Learner.py
in terminal
- Tkinter
- Matplotlib
- Seaborn