Support for MARL

Question

Support for MARL

jpiabrantes opened this issue 5 years ago · comments

Do you plan supporting environments for Multi-Agent RL in the near future?

This would be a key feature in my choice for a RL library.

michaelschaarschmidt · Answer 1 · Tue Mar 19 2019 18:33:28 GMT+0800 (China Standard Time)

Hi,
we are interested in multi-agent but there is no clear timeline, although did some analysis on what would be needed. Since this is a wide field, any specific semantics/algorithm you have in mid?

João Abrantes · Answer 2 · Tue Mar 19 2019 20:42:30 GMT+0800 (China Standard Time)

I had in mind a similar API to the one used in RLlib:

Having a MultiAgentEnv where the obs_dict, rew_dict, done_dict, info_dict = step(action_dict) method receives a dictionary with unique agent string ids as keys and the corresponding agent's actions as values. It would also return similar dictionaries for the observations, rewards and termination signals.
The environment can also optionally return a mega state to train centralised value functions.
Multiple agents can share the same policy model or not.

In terms of algorithms, it would be great if non MARL-specific algorithms like the PPO would work out-of-the-box in Multi-Agent environments. I feel that it be more important to have the necessary structure to interact with MultiAgentEnv than to implement MARL-specific algorithms. Ideally the specific MARL algorithms could be implemented by users of the library once they need them.

michaelschaarschmidt · Answer 3 · Wed Mar 20 2019 17:14:01 GMT+0800 (China Standard Time)

Ok, the environment is not hard to implement, the thing I am still thinking about and which potentially needs a bit more planning is how multiple agents are coordinated.

I agree in first instance one could do just a collection of independent agents. The main issue is that RLgraph is not centered around Ray the same way RLlib is, so would have to decide what entity holds these agents.

I thought about a MultiAgentCoordinator which wraps a collection of agents and is then used in the Ray executors even in single-agent cases (and in the single-threaded executor), while leaving individual agents as is. Sharing the same policy would then require syncing. Alternatively (and requiring much larger changes), is to allow individual agents to hold multiple policies and include sharing in the build process (similar to RLlib's agents actually being containers for collections of policies).