Tong-ZHAO / AlphaDraughts-Zero

Final project for Reinforcement Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AlphaDraughts-Zero

This repository is the final project for Reinforcement Learning - M2 MVA 2018. Inspired by AlphaGo Zero, we apply this method on English Checkers, a famous strategy board games for two players.

Requirements

  • Linux or macOS
  • Python 3, version 3.4 or later is preferred
  • PyTorch 1.0
  • CPU or NVIDIA GPU + CUDA CuDNN

For pip users, run pip install -r requirements.txt to install dependencies.

Getting Started

Installation

  • Clone this repo:
git clone https://github.com/Tong-ZHAO/AlphaDraughts-Zero
cd AlphaDraughts-Zero
pip install -r requirements.txt

Train

  • The training parameters should be specified in ./src/config.py beforehand.
    Some parameters can be passed to ./src/train.py as arguments:
usage: train.py [-h] [--iterations N] [--lr LR] [--seed S] [--env ENV]

Training of AlphaDraughts Zero

optional arguments:
  -h, --help      show this help message and exit
  --iterations N  number of iterations of pipeline training)
  --lr LR         learning rate (default: 0.01)
  --seed S        random seed (default: 42)
  --env ENV       visdom environment
  • To start the training:
cd src
python train.py
  • To view loss plots, run python -m visdom.server and click the URL http://localhost:8097.
  • To see more details of training, check the log file in ./logs/.

Test

  • To start the human-machine competition (qualitative evaluation):
cd src
python gui.py
  • Some arguments could be passed to ./src/gui.py:
usage: gui.py [-h] [--checkpoint C] [--human H] [--simulation S] [--ai A]

optional arguments:
  -h, --help      show this help message and exit
  --checkpoint C  which neural network model checkpoint to use.
  --human H       "white" or "black", which side human player plays, white
                  side always goes first.
  --simulation S  number of simulations for MCTS at each time step to choose
                  the action.
  --ai A          whether use AI, 1 means using AI, 0 means not using AI.

  • Quantitative evaluation could be done using functions provided in ./src/elo.py.

About

Final project for Reinforcement Learning


Languages

Language:Python 100.0%