MuZero General

A commented and documented implementation of MuZero based on the Google DeepMind paper and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to edit the game file with the parameters and the game class. Please refer to the documentation and the example.

MuZero is a model based reinforcement learning algorithm, successor of AlphaZero. It learns to master games without knowing the rules. It only knows actions and then learn to play and master the game. It is at least more efficient than similar algorithms like AlphaZero, SimPLe and World Models. See How it works.

Features

Demo

All performances are tracked and displayed in real time in TensorBoard :

Testing Lunar Lander :

Games already implemented

Cartpole (Tested with the fully connected network)
Lunar Lander (Tested in deterministic mode with the fully connected network)
Gridworld (Tested with the fully connected network)
Tic-tac-toe (Tested with the fully connected network and the residual network)
Connect4 (Slightly tested with the residual network)
Gomoku
Atari Breakout

Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.

Code structure

Getting started

Installation

git clone https://github.com/werner-duvaud/muzero-general.git
cd muzero-general

pip install -r requirements.txt

Run

python muzero.py

To visualize the training results, run in a new terminal:

tensorboard --logdir ./results

Authors

Werner Duvaud
Aurèle Hainaut
Paul Lenoir

About

MuZero

https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation

MIT License

Languages

Language:Python 100.0%