jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.

Home Page:https://jonathan-laurent.github.io/AlphaZero.jl/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Non-trivial games

StepHaze opened this issue · comments

Is it possible to create a great player using AlphaZero.jl for non-trivial games like chess, go, shogi?
Or it's only good for simple games like connect4 and mancala?

Yes, it is possible. Have a look at the game ReKtaNG on https://rektang.com/.
The solo mode is powered by various AlphaZero.jl agents with different levels of training, for an adaptive game difficulty.

Yes, it is possible. Have a look at the game ReKtaNG on https://rektang.com/. The solo mode is powered by various AlphaZero.jl agents with different levels of training, for an adaptive game difficulty.

is it really a complex game like like chess, go, shogi? With an unlimited number of possible positions?

The more relevant number to measure game-complexity when it comes to AlphaZero is the size of the action space and not the total number of positions. Also, strictly speaking, Chess and Go do have a finite (although very large) state space.

When learning from scratch, AlphaZero's training time is going to depend strongly on the size of the action space. This is to be expected since AlphaZero discovers new moves by randomly exploring actions. Therefore, if all you have is a single GPU, you are going to be limited to action spaces much smaller to the ones occurring in Chess and Go. AlphaZero.jl could likely tackle such games given enough cloud-computing credit but honestly, it has not primarily been designed with this goal in mind.

A more realistic way to use AlphaZero.jl with games of complexity similar to Chess or Go is to bootstrap it with a policy that is already decent. Such a policy could be learned using supervised learning from human games for example. In this case, AlphaZero.jl could probably enable you to turn a decent policy into a really good one without an insane amount of compute. I've been interested in someone trying this out for a while.

More generally, what you have to keep in mind when thinking about AlphaZero is that learning policies from scratch via random exploration is insanely wasteful and expensive compute-wise. It has the advantage of generality and being able to scale and leverage huge amounts of compute. However, solving games such as Chess or Go with AlphaZero from scratch requires a staggering amount of compute that is inaccessible to most (and will likely stay this way for a long time).

The honest truth is that outside a small number of games with tiny action spaces (Connect Four, Othello, Reversi...), applying AlphaZero naively to learn a policy from scratch over the full action space is going to be overly expensive for most people. In fact, even with Google-scale compute capabilities, many games are still going to be completely out of reach. The only way out if you are interested in those games is to combine AlphaZero with other methods and/or leverage game-specific knowledge to bootstrap from decent policies and engineer more tractable action spaces. But doing so requires work and insight and will never be automated by a push-button library.

@smart-fr How large exactly is the action space of ReKtaNG? Did you use any game-specific trick to make AlphaZero more tractable for your game?

Thank you for your interest in ReKtaNG. Don't hesitate to try it out, you'll love it 😊 (and even more when it's fully "gamified" in the future).

The size of the action space is 2048, between go (362 according to this source) and chess (4672 according to this source).

But indeed, I use a pretty simple heuristic to filter out most theorically possible actions at each turn: an AlphaZero.jl agent retains only the 128 most "impactful" legal actions. All legal actions from a given piece have the same level of impact, which is a function of the piece centrality on the board and its perimeter in contact with opposing pieces.

This way I could start from scratch with a totally naïve agent, and, despite a quite large action space, train it up to a fairly good level with just a few iterations (less than 10). After that, I continued training for a total of 62 iterations, and observed improvements in the agent's level for each iteration when the network was updated. This happened last at iteration 50.

@smart-fr This is a very good example of using game-specific knowledge to make AlphaZero tractable when a naive application wouldn't be. Thanks!

Yes, just like Gomoku, even if the board is very large, the legal actions can be limited to two squares around it