jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.

Home Page:https://jonathan-laurent.github.io/AlphaZero.jl/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for multiplayer games?

kendonB opened this issue · comments

I saw a YouTube video suggest that this was difficult in principle due to the possibility of the agent forming cartels (i.e. it learns that it's always best to cooperate with position 2 if it finds itself in position 1 and vice versa).

This should be possible to avoid by just choosing the objective function to disincentivise collaboration.

So rather than having the agent maximise own win probability it could, for example, maximise the difference between own win probability and that of the opposing player most likely to win. Perhaps the negative weights could be applied to all other players weighted by their win probability.

Mentioned in #101 and you make a similar point there. I don't think there needs to be a single correct objective function - I think it just needs to have some weight on own win probability and some negative weight on opponents in strong positions.

Your idea may have potential but it is way too abstract in its present form for me to evaluate. I would encourage you to flesh it out using a concrete game as an example. Also, try and be specific about how each component of AlphaZero should be adapted to work with your idea (MCTS, network training objective, self-play...).