alphazero alpha-zero reinforcement-learning self-play mcts monte-carlo-tree-search othello ray

Alpha Zero General - as a library

An implementation of the AlphaZero algorithm for adversarial games to be used with the machine learning framework of your choice.

This is a fork of https://github.com/suragnair/alpha-zero-general turned into a library.

The idea is to have a small and clean library version of the code, with minimal requirements, without many implementations of specific games or large model files. The only 'heavy' requirement is the ray-library, which is used to make the algorithm fully async and parallelized (potentially even across multiple machines).

Information

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of the self-play based reinforcement learning algorithm Alpha Zero. It is designed to be easy to adopt for any two-player turn-based adversarial game with perfect information. Use it with the machine learning framework you like. A sample implementation has been provided for the game of Othello in TensorFlow/Keras, see example/othello/.

Usage

To use this library for the game of your choice, subclass from Game and NeuralNet and implement their functions. Define your config and use the Coach to start the learning algorithm. Test your model by playing against it in the Arena against one of the Player classes. Evaluate multiple players in a League and let it calculate the ELO score for you.

Motivation

Why did I do this? Well, mostly for myself. :) I wanted to play with the algorithm, I was in a mood of working with existing code, and I wanted to learn about ray.

ToDo list

Library:

Refactor:

General player classes:

AlphaZeroPlayer out of pit code
BareModelPlayer
HumanPlayer: named action mapping

Asynchronous & parallel processing:

Ray step 1: Use Ray to parallelize self-play
Ray step 2: Share weights across ray actors
Ray step 3: Make self-play and training fully async
Add parameter to control selfplay vs training
Ray step 4: Parallelize the arena play during league execution
Successfully try multi-machine parallelization

Improvements:

Store all models, if accepted or not
Store training examples per game to reduce data duplication
Be able to continue training
Add dirichlet noise for better exploration

New features:

League evaluations with ELO scores

Develop

Requirements:

Operating system: Linux/Mac (Windows is only experimental in the ray-library)
Python >= 3.7

To locally build and install, simply run

make

To execute the tests, run

make test

This will additionally install tensorflow, because the keras example implementation of Othello is used during the tests.

Evaluate

How can I know a change in the code/a change of parameters is actually for the good? How can I evaluate that this brings better results?

First rule: Only change one parameter at a time when comparing two runs.
Random choice is also a parameter: Be sure to set the same random seeds across runs: for python, for numpy, and for your framework, like tensorflow/pytorch.
Repeat your experiment with different random seeds.
Initial model parameters are parameters: Start from the same initialized (untrained) model across runs.
Be aware that changing e.g. exploration parameters might have a different impact in different phases of the training. Ideally, you have an 'early' game model (where the model has only seen a none to few games), a 'mid' game (where it has seen several thousand games) and a 'late' game model (which as seen a lot of games). Observe the effect of your change of code/parameters in all three stages.
Don't compare the model training losses. Since the training data is continuously changing, you wouldn't have a common ground for comparing those.
Compare the game play performance:
- Let two competitor agents play against each other in the Arena (remember this requires that changes to the code need to be fully parameterized).
- Let two competitor agents play against baselines (like RandomPlayer, GreedyPlayer, BareModelPlayer).
- Observe the win rate or the ELO in a tournament.

Contributors and Credits

The original version was written by Surag Nair and credits go to all contributors of https://github.com/suragnair/alpha-zero-general.
The use of ray is inspired by https://github.com/werner-duvaud/muzero-general.

About

An implementation of the AlphaZero algorithm for adversarial games to be used with the machine learning framework of your choice

https://github.com/peldszus/alpha-zero-general-lib

alphazero alpha-zero reinforcement-learning self-play mcts monte-carlo-tree-search othello ray

MIT License

Languages

Language:Python 98.6%Language:Makefile 1.4%