AndrewSpano / Alpha-Zero-Racing-Kings

Alpha Zero general-purpose reinforcement learning algorithm implemented for the Chess variant Racing Kings.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alpha Zero Racing Kings

This repo contains a python 3.8 implementation of the Alpha Zero general-purpose reinforcement learning algorithm for the Racing Kings Chess variant. The implementation is modular and can easily be adapted for other chess variants that are supported by the python-chess library.

Racing Kings

The goal in this variant is to reach any square in the 8th rank with your king. Whoever gets there first, wins the game (unless white reaches it first and black can reach it in his next move, then it's a draw). No checks are allowed. Hence, walking into a check is also illegal. The starting board looks like this:

alt text

From personal experience, the average length of Racing Kings game for high rated players is ~15 moves. The number of legal actions per position is significantly smaller than that of regular Chess. This makes the Racing Kings variant a much easier game to master, than regular Chess.

Alpha Zero

AphaZero is a computer program developed by Artificial Intelligence research company DeepMind to master the games of Chess, Shogi and Go. It uses an algorithm very similar to that of AlphaGoZero. The original paper by DeepMind can he found here. In this repo, the implementation of Alpha Zero is the same as described in the paper. Of course, some modifications have been made in the state and action representations in order to match the Racing Kings variant.

Self Play

Unless you have access to a supercluster, self-play will be way too slow. The functionality has been implemented, but it's just not practical if you do not have the resources to train. The process takes time because the agent has to figure out the game by himself (i.e. checkmate the opponent by moving randomly).

Supervised Learning

Since self-play is costly, supervised learning has also been implemented in order to speed up training. This means that human knowledge is being inserted in the model, which wonders off the world of Alpha Zero. That's why it is a separate method, and can be omitted. The database used is from lichess.org, and can be downloaded locally using the download bash script.

Repository Structure

Configurations

All the configuration files are located in the configurations. They are used for running the main scripts. They can easily be edited manually, but make sure that the data types are correct.

Source Code

The source code for the project is located in the src python package. The code is organized in the following modules:

Tests

Unit Tests have been implemented in the tests directory to make sure that no functionalities break when adding new features.

Usage

Activate a virtual environment and install the required python libraries:

$ conda activate venv
$ pip install -r requirements.txt

Self Play training

In order to train your agent with self play, use the following commands:

$ cd src
$ python3 train.py --train-config    [path_to_train_configuration_file]
                   --nn-config       [path_to_neural_network_configuration_file]
                   --nn-checkpoints  [path_to_directory_to_save_nn_weights]
                   --mcts-config     [path_to_monte_carlo_tree_search_configuration_file]
                   --device          [cpu | cuda]

Note: The [path_to_directory_to_save_nn_weights] must be pointing to an already existing directory.

An example of running the train script can be:

$ python3 train.py --train-config ../configurations/training_hyperparams.ini
                   --nn-config ../configurations/neural_network_architecture.ini
                   --nn-checkpoints ../models/checkpoints
                   --mcts-config ../configurations/mcts_hyperparams.ini
                   --device cpu

Evaluation (playing against the agent)

In order to play against the agent, you must have trained him and stored the weights of the NN in a specific file. To play against him, use the command:

$ cd src
$ python3 evaluate.py --nn-config            [path_to_neural_network_configuration_file]
                      --pre-trained-weights  [path_to_file_with_NN_weights]
                      --mcts-config          [path_to_monte_carlo_tree_search_configuration_file]
                      --device               [cpu | cuda]
                      --white

The last flag (--white) determines the color of the user. If specified, the user will have the white pieces. Else, if omitted, the user starts with the black pieces.

Note: For the visualization of the board, the python chess-board library has been used. It has some minor issues, that can be easily solved by following the steps in the docstring of the Base Chess Environment script. If the user does not wish to use a display (and therefore deal with this matter), the parameter --no-display can be specified.

An example of running the script is:

$ python3 evaluate.py --nn-config ../configurations/neural_network_architecture.ini
                      --pre-trained-weights ../models/checkpoints/iteration_0_weights.pth
                      --mcts-config ../configurations/mcts_hyperparams.ini
                      --device cpu
                      --white
                      --no-display

Supervised Training

In order to train the agent using Supervised Learning, first you have the download the data files (.png) from lichess. You can use the following bash script to do so: download_racing_kings_data. After the data has been downloaded (suppose in the ./Dataset directory), to train the agent (first with supervised learning, then with self play), run the commands:

$ cd src
$ python3 train_supervised.py
  --train-config                  [path_to_train_configuration_file]
  --nn-config                     [path_to_neural_network_configuration_file]
  --nn-checkpoints                [path_to_directory_to_save_nn_weights]
  --supervised-train-config       [path_to_supervised_train_configuration_file]
  --data-root-directory           [path_to_the_root_directory_containing_data]
  --parsed-data-destination-file  [path_to_store_data_once_parsed]
  --mcts-config                   [path_to_monte_carlo_tree_search_configuration_file]
  --device                        [cpu | cuda]

An example:

$ python3 train_supervised.py
  --train-config ../configurations/training_hyperparams.ini
  --nn-config ../configurations/neural_network_architecture.ini
  --nn-checkpoints ../models/checkpoints
  --supervised-train-config ../configurations/supervised_training_hyperparams.ini
  --data-root-directory ../Dataset
  --parsed-data-destination-file ../Dataset/parsed_data.pickle
  --mcts-config ../configurations/mcts_hyperparams.ini
  --device cpu

Note: If the data has already been parsed once (it takes ~2 hours), then instead of re-parsing it when running again the supervised training script, the parameter --parsed-data [path_to_parsed_data_file] can be specified, to directly load the parsed data, like so:

$ python3 train_supervised.py
  --train-config ../configurations/training_hyperparams.ini
  --nn-config ../configurations/neural_network_architecture.ini
  --nn-checkpoints ../models/checkpoints
  --supervised-train-config ../configurations/supervised_training_hyperparams.ini
  --parsed-data ../Dataset/parsed_data.pickle
  --mcts-config ../configurations/mcts_hyperparams.ini
  --device cpu



Future Work - Possible Ideas & Improvements

For future work, in order to further improve the algorithm, the following ideas can be implemented:

  1. Optimize the code - Make it MultiThreaded (simultaneously execute episodes of self play in the same iteration).
  2. Add a Curriculum Learning mechanism: sample endgame positions (could be random ones, but preferably from human data), and make the agent first play on these positions in order to discover positive rewards (i.e. checkmate) earlier, and therefore speed up learning. The paper describing this idea can be found here.
  3. Implement Monte Carlo Graph Search instead of the regular Tree Search. In this approach, the search tree is generalized to an acyclic graph, grouping together similar positions and hence reducing significantly the state space. The paper describing this approach can be found here.

Contributing - Adding more Variants

The code is very flexible. The Alpha Zero agent and the Monte Carlo Tree Search classes have been implemented to be compatible with any Chess environment that inherits from the Base Chess Environment, and any action-translator that inherits from the Move Translator. Thus, to add a new variant, you have to follow the next 3 steps:

  1. Create a Wrapper class for that variant, that uses the Python-Chess library, like the one already implemented for Racing Kings here.
  2. Create a MoveTranslator class for that variant, like the one implemented here.
  3. Adjust the main scripts (train.py, evaluate.py and train_supervised.py) to use the classes of the variant you just implemented in the previous 2 steps.

Feel free to contribute.

About

Alpha Zero general-purpose reinforcement learning algorithm implemented for the Chess variant Racing Kings.

License:MIT License


Languages

Language:Python 99.6%Language:Shell 0.4%