jackode / AlphaGOZero-python-tensorflow

Congratulation to DeepMind! This is a reengineering implementation (on behalf of many other git repo in /support/) of DeepMind's Oct19th publication: [Mastering the Game of Go without Human Knowledge]. The supervised learning approach is more practical for individuals.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AlphaGOZero

This is a trial implementation of DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.


Useful links:

All DeepMind’s AlphaGO games

GoGOD dataset, $15

KGS >=4dan, FREE

Youtube: Learn to play GO

repo: MuGo

repo: ROCAlphaGO

repo: miniAlphaGO

repo: resnet-tensorflow

repo: leela-zero (c++ AlphaGo Zero replica)

repo: reversi-alpha-zero (if you like reversi(黑白棋))

From Paper:

Our program, AlphaGo Zero, differs from AlphaGo Fan and AlphaGo Lee 12 in several im- portant aspects. First and foremost, it is trained solely by self-play reinforcement learning, starting from random play, without any supervision or use of human data. Second, it only uses the black and white stones from the board as input features. Third, it uses a single neural network, rather than separate policy and value networks. Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte- Carlo rollouts. To achieve these results, we introduce a new reinforcement learning algorithm that incorporates lookahead search inside the training loop, resulting in rapid improvement and precise and stable learning.

Congratulation to DeepMind to pierce the frontier once again! AlphaGO Zero (fully self-play by reinforcement learning with no human games examples).

I downloaded the paper Mastering the Game of Go without Human Knowledge in the first place, but only found myself lack prior knowledge in Monte Carlo Search Tree (MCST). I tried my best to highlight what is interesting.

This time's AlphaGo uses combined policy & value network (final fc diverges to two branches) to cope with training stability. From Paper:

Innovation (annealing & Dirichlet noise) in MCTS has enabled exploration

From Paper:

And exploration leads to learning more and more complex movings, making the game at the end of training (~70h) both competitive and balanced.

From Paper:

The input is still raw stones but normal CNN has been replaced by residual net

From Paper:

And finally pure RL has outperformed supervised learning+RL agent

From Paper:

AlphaGo Zero Architecture:

  • input 19 x 19 x 17: 7 previous states + current state player’s stone, 7 previous states + current state opponent’s stone, player’s colour
    1. A convolution of 256 filters of kernel size 3 x 3 with stride 1
    2. Batch normalisation
    3. A rectifier non-linearity

Residual Blocks

    1. A convolution of 256 filters of kernel size 3 x 3 with stride 1
    2. Batch normalisation
    3. A rectifier non-linearity
    4. A convolution of 256 filters of kernel size 3 x 3 with stride 1
    5. Batch normalisation
    6. A skip connection that adds the input to the block
    7. A rectifier non-linearity

Policy Head

  • 1.A convolution of 2 filters of kernel size 1 x 1 with stride 1 2. Batch normalisation 3. A rectifier non-linearity 4. A fully connected linear layer that outputs a vector of size 192^2 + 1 = 362 corresponding to logit probabilities for all intersections and the pass move

Value Head

    1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
    2. Batch normalisation
    3. A rectifier non-linearity
    4. A fully connected linear layer to a hidden layer of size 256
    5. A rectifier non-linearity
    6. A fully connected linear layer to a scalar
    7. A tanh non-linearity outputting a scalar in the range [ 1, 1]

Set up

Install requirement

python 3.6

pip install -r requirement.txt

Download Dataset (kgs 4dan)

Under repo's root dir

cd data/download
chmod +x download.sh
./download.sh

Preprocess Data

It is only an example, feel free to assign your local dataset directory

python preprocess.py preprocess ./data/SGFs/kgs-*

Train A Model

python main.py --mode=train --force_save —-n_resid_units=20

Play Against An A.I. (currently only random A.I. is available)

python main.py --mode=gtp —-policy=random --model_path='./savedmodels/model--0.0.ckpt'

Basic Self-play

Under repo’s root dir

python utils/selfplay.py

Credit:

*Brain Lee *Ritchie Ng

About

Congratulation to DeepMind! This is a reengineering implementation (on behalf of many other git repo in /support/) of DeepMind's Oct19th publication: [Mastering the Game of Go without Human Knowledge]. The supervised learning approach is more practical for individuals.

License:MIT License


Languages

Language:Python 99.4%Language:Shell 0.6%