google-deepmind / open_spiel

I am trying to use PPO (which worked wonderfully out of the box, thank you very much for it) to learn a game that allows the same player to take multiple actions in a row, and depending on which action is performed, the next action may belong to a player or another.

It is not clear how to do so because a agent takes both the quantity of envs and the index, so if i start 5 envs for a game with two PPO agents and then after a action the turn in two of the envs belongs to a different player, i don't have enough envs to pass to any agent.

I have looked around the repo but i have not found any hint that this problem is already solved by some other mechanism.
From what i understand the alternatives to solve it are:

Have more envs than NumAgents * NumEnvsPerPlayer so that at least one of the agents always has enough environments to be able to run. It is not clear to me if this would be a issue for PPO, since it would not see the game evolve in order.
Have a fake action that both players can always execute that does nothing.
Have a single agent that plays both sides, but that is not possible because the agent takes the player ID.

Do you have any suggestion about which is the correct way of addressing this issue?

Thank you in advance.

Hi @drblallo,

I don't really understand the question, sorry.

But please note that the PPO implementation only supports the single-agent case:

open_spiel/open_spiel/python/pytorch/ppo.py

Line 21 in 7bfca5f

Currently only supports the single-agent case.

It was added for a specific use case and was never extended to the multiagent case.

So it has only been used and tested on single-agent settings like Atari or asca best response oracle.

I suspect this addresses your question: basically this code does not address the situation you describe since it was designed for the single agent setting.

Hope this helps.

Closing due to inactivity. Please re-open if you would like to follow up.

PPO and selfplay