ai alphazero machine-learning reinforcement-learning

A Reinforcement-Learning based Renju game

This is my personal project to practice reinforcement learning. You can download the application and play with it.

Renju is a professional variant of gomoku(five in a row) by adding following restrictions to black stones to weaken the advantages of the first player in the game.

Double three – Black cannot place a stone that builds two separate lines with three black stones in unbroken rows (i.e. rows not blocked by white stones).
Double four – Black cannot place a stone that builds two separate lines with four black stones in a row.
Overline – six or more black stones in a row.

Overview

AI is implemented using residual Convolutional Neural Network and Monte Carlo Tree Search.
AI is not told how to play game. AI learns how to play the game by playing with itself (a.k.a Self Player).
Algorithm design is modified from Alpha Go Zero
- AlphaGo: How it works technically?
- AlphaZero - A step-by-step look at Alpha Zero and Monte Carlo Tree Search
- AlphaGo Zero — a game changer. (How it works?)
- Lessons From Alpha Zero 1 2 3 4 5 6
The application is developed in Rust language to avoid performance bottleneck in Python for MCTS.
Self-play is much slower than training. Hence Self-Play is opitimized to use quantization. It shows better performance than GPU on Mac M1.
A novel lock-free tree implementation for MCTS.

Neural Network

The following graph shows the policy-value network from AlphaGo Zero.

It has been simplified here:

Number of residual blocks is reduced from 39 to 19.
Residual block width is narrowed from 256 filters down to 64 filters.
Since width is reduced to 1/4 and dying ReLU problem was encountered in first attempt, hence activation function ReLU of last dense layer is replaced with ELU.
Input is simplified to a (1, 4, 15, 15) NCHW tensor with four 15x15 planes.
- The first plane represents stones of current player
- The second plane represents stones of opponent player
- The third plane represents the position of last move
- The fourth plane are filled with ones if current player is black; or zeros if white.
Loss function

$$ l = (z-v)^{2}-\pi ^{T}ln(p)+c\left || \theta \right ||^{2} $$

About

A Reinforcement-Learning based Renju game

ai alphazero machine-learning reinforcement-learning

Apache License 2.0

Languages

Language:C++ 51.1%Language:CSS 26.1%Language:Rust 17.1%Language:C 2.2%Language:Svelte 1.5%Language:Python 1.4%Language:Shell 0.3%Language:PowerShell 0.1%Language:HTML 0.1%Language:Batchfile 0.0%Language:JavaScript 0.0%