reinforcement-learning episodes easy21 blackjack ucl monte-carlo temporal-differencing-learning

Reinforcement Learning on Easy21

Reinforcement Learning is applied to Easy21. This is an assignment as part of David Silver's Reinforcement Learning Course at UCL. The assignment can be found here.

Monte-Carlo Control

python3 monteCarlo.py

The agent played 1 Million games (episodes) to obtain the following Value function:

Visualized as a heatmap:

The optimal policy chosen by selecting the actions with the highest value:

TD Learning

python3 temporalDifference.py

The MSE of Q, the state-action function, over the course of episodic learning. For each lambda, 10,000 Episodes have been measured against the Monte-Carlo 1 Million state-action function, saved in Q.dill:

Mean Squared Error after 1,000 episodes for different lambdas:

The optimal policy as derived from 10,000 episodes of TD(lambda = 0.3):

Linear Function Approximation

python3 lfa.py

The matrix lookup-table approach of the previous models are replaced by coarse coding function approximator. This reduces the 420 state-action combinations down to 36.

About

Reinforcement Learning as applied to a simplified blackjack game: Easy21

reinforcement-learning episodes easy21 blackjack ucl monte-carlo temporal-differencing-learning

Languages

Language:Python 100.0%