marekgalovic / othello-rl

Self-play reinforcement learning for Othello

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Othello RL

An implementation of self-play reinforcement learning algorithm for 8x8 Othello. The algorithm uses Monte Carlo Tree Search to improve its policy in every turn and then updates the policy so that actions taken by MCTS have higher probability if they led to winning the game.

Results

The agent was trained for approximately 24 hours on a node with 32 CPU cores. In every iteration it played 160 games against a randomly sampled version of past agent parameters. This prevents agent's policy from overfittnig. 50 MCTS traversals per turn were used and the game tree was reset after every game. To update agent parameters I used clipped PPO loss.

Win Rates

About

Self-play reinforcement learning for Othello


Languages

Language:Python 93.0%Language:Shell 4.9%Language:Dockerfile 2.2%