Treating the game of Blackjack as a Markov Decision Process, this research notebook attempts to train an agent to play the game using the Deep Q-Learning environment.
time
collections
gym
numpy
PIL
tensorflow
pyvirtualdisplay
copy
We will OpenAI's Gym library to load and attempt to solve the Blackjack environment.
The goal of the Blakcjack environment is to train an agent to beat the dealer in Blackjack by obtaining cards that sum close to 21, without going over 21, and yet still have a higher value thant the dealer's card.
Blackjack-v1 Environment
The action space consists of two actions represented by discrete values.
0
: Stick1
: Hit
The agent's observation space is a state vector containing 3 variables:
- Player's current sum
[int]
- Dealer's one showing card (1- 10)
[int]
- Whether a player holds a usable ace
[bool]
- Win game: +1
- Lose game: -1
- Draw: 0
- Win game with natural Blackjack: +1.5 if
natural=True
, else +1