andylolu2 / Alpha-Reversi

This model implements the Alpha Zero paper on the game of Reversi/Orthello.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aphla Reversi

This model implements the Alpha Zero paper on the game of Reversi/Orthello.

The overall algorithm is divided into three parts:

  1. Self-play
    • The current best model plays itself to generate training data.
    • Each game takes about 10 seconds to complete.
  2. Training
    • A model is fed the training data generated from self-play.
    • The input to the model is the current board state with history.
    • The labels are the policy evaluated by MCTS, and the value given by the final winner of the game.
    • This model is trained for ~30hours.
  3. Evaluation
    • The newly trained model competes with the current best model. It replaces the current best if it wins.
    • Each games takes about 20 seconds to complete.

Model elo over time

Model elo over time

Training loss over time

Traing loss over time

About

This model implements the Alpha Zero paper on the game of Reversi/Orthello.

License:MIT License


Languages

Language:Python 100.0%