shin-ka / OpenAI-Monte-Carlo-Control-for-Blackjack

Implementation of constant-alpha Monte Carlo Control Method to construct an optimal policy for playing the game of Blackjack

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

constant-α Monte Carlo Control for the game of Blackjack

This is my implementation of constant-α Monte Carlo Control for the game of Blackjack using Python & OpenAI gym's Blackjack-v0 environment. OpenAI's main code for how the game environment works can be found here.

The pseudocode for constant-α Monte Carlo Control is as follows:

Repository

This repository contains:

Model Outputs

At the end, MC control aims to output the following:

  • 1. Optimal Policy
  • 2. Optimal Q (action-value function table)

List of Hyperparameters used:

  • alpha (content image weight) = 0.02
  • gamma (discount rate) = 0.9
  • number of episodes = 500000

Results

Here are the results I obtained after executing MC Control using the above hyperparameteres:

  • State-value Functions for vs. Current Sum of Player's Cards and Dealer's Face-up Card

Looking at the results, some observations can be made:

  • For both cases, the worst state to be in for the player is to have a sum of around 16.
  • For both cases, state-values increase linearly as the player's sum increases from 16 to 22.
  • If Player has a Usable Ace, Dealer's face-up card seems to not matter much for the state value.
  • If Player does NOT have a Usable Ace, state-values seems to fluctuate depending on Dealer's face-up card. For example, if Player's sum is between 16 and 22, the state-values are higher when Dealer's card ranges from 6 to 9.
  • Generated Optimal Policy (Hit: Draw a new card, Stick: Stop drawing a new card)

The above plots resemble the true optimal policy plots shown in the reinforcement learning textbook as below!

Sources

I referenced Udacity's Deep Reinforcement Learning Nanodegree github repository for building & debugging the final model :

About

Implementation of constant-alpha Monte Carlo Control Method to construct an optimal policy for playing the game of Blackjack


Languages

Language:Python 100.0%