jkaardal / tictactoe-reinforcement-learning

Using reinforcement learning to teach a computer to play tic-tac-toe.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Deep" reinforcement learning of Tic-Tac-Toe

This package trains an agent with an approximate policy using a dense neural network of arbitrary depth. Residual connections are provided at defined intervals to avoid issues like exploding/disappearing gradients. This is all implemented in tensorflow >= 2.0 using eager execution.

Three policy approximation techniques are provided:

  • REINFORCE,
  • REINFORCE with baseline,
  • and actor-critic.

An approximate state-value function is used for the baseline in the latter two policies. Two types of value functions are provided:

  • tabular
  • and quadratic polynomial approximation.

Both are updated using a Monte Carlo algorithm.

For a guide to the parameters to train this model, see the help string to the train.py module:

python -m tictactoe-reinforcement-learning.train -h

which is the main entrypoint.

For more thorough background information and a summary of the theory, please read this link.

For a playable demo of an entirely tabular implementation in javascript, see this link with source code found in this repository.

About

Using reinforcement learning to teach a computer to play tic-tac-toe.


Languages

Language:Python 100.0%