zhangyx96 / epciclr2020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

This is the code for implementing the MADDPG algorithm presented in the paper: Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. It is configured to be run in conjunction with environments from the (https://github.com/qian18long/epciclr2020/tree/master/mpe_local). We show our gif results here (https://sites.google.com/view/epciclr2020/). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.

Installation

  • To install, cd into the root directory and type pip install -e .

  • Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)

Case study: Multi-Agent Particle Environments

We demonstrate here how the code can be used in conjunction with the(https://github.com/qian18long/epciclr2020/tree/master/mpe_local). It is based on(https://github.com/openai/multiagent-particle-envs)

Command-line options

Environment options

  • --scenario: defines which environment in the MPE is to be used (default: "grassland")

  • --map-size: The size of the environment. 1 if normal and 2 otherwise. (default: "normal")

  • --sight: The agent's visibility radius. (default: 100)

  • --alpha: Reward shared weight. (default: 0.0)

  • --max-episode-len maximum length of each episode for the environment (default: 25)

  • --num-episodes total number of training episodes (default: 200000)

  • --num-good: number of good agents in the scenario (default: 2)

  • --num-adversaries: number of adversaries in the environment (default: 2)

  • --num-food: number of food(resources) in the scenario (default: 4)

  • --good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

  • --adv-policy: algorithm used for the adversary policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

Core training parameters

  • --lr: learning rate (default: 1e-2)

  • --gamma: discount factor (default: 0.95)

  • --batch-size: batch size (default: 1024)

  • --num-units: number of units in the MLP (default: 64)

  • --good-num-units: number of units in the MLP of good agents, if not providing it will be num-units.

  • --adv-num-units: number of units in the MLP of adversarial agents, if not providing it will be num-units.

  • --n_cpu_per_agent: cpu usage per agent (default: 1)

  • --good-share-weights: good agents share weights of the agents encoder within the model.

  • --adv-share-weights: adversarial agents share weights of the agents encoder within the model.

  • --use-gpu: Use GPU for training (default: False)

Checkpointing

  • --save-dir: directory where intermediate training results and model will be saved (default: "/test/")

  • --save-rate: model is saved every time this number of episodes has been completed (default: 1000)

  • --load-dir: directory where training state and model are loaded from (default: "test")

Evaluation

  • --restore: restores previous training state stored in load-dir (or in save-dir if no load-dir has been provided), and continues training (default: False)

  • --display: displays to the screen the trained policy stored in load-dir (or in save-dir if no load-dir has been provided), but does not continue training (default: False)

  • --save-gif-data: Save the gif examples to the save-dir (default: False)

  • --render-gif: Render the gif in the load-dir (default: False)

Code structure

  • .maddpg_o/experiments/train_helper/train_helpers.py: contains code for training MADDPG, Att-MADDPG, mean-field, Vanilla PC and EPC on the MPE

  • .maddpg_o/experiments/train_normal.py: apply the train_helpers.py for MADDPG, Att-MADDPG and mean-field training

  • .maddpg_o/experiments/train_normal.py: apply the population curriculum in train_helpers.py to add an agent in model of load_dir.

  • .maddpg_o/experiments/train_x2.py: apply the population curriculum in train_helpers.py to duplicate agents in model of load_dir.

  • .maddpg_o/experiments/train_mix_match.py: Mix match of the good agents in '--sheep-init-load-dirs' and adversarial agents in '--wolf-init-load-dirs' for model agents evaluation.

  • .maddpg_o/experiments/compete.py: Mix match within all agent groups in '--competitor-load-dirs' to evaluate all agent groups' performances for EVOLUTIONARY SELECTION.

  • ./maddpg_o/maddpg_local/micro/maddpg.py: core code for the MADDPG based algorithm

  • ./maddpg_o.experiments.train_helper.union_replay_buffer: replay buffer code

  • ./maddpg_o/maddpg_local/common/distributions.py: useful distributions used in maddpg.py

  • ./maddpg_o/maddpg_local/common/tf_util.py: useful tensorflow functions used in maddpg.py

Quick start

  • Run run_att_grassland.sh for att-madddpg method.

Paper citation

@inproceedings{epciclr2020, author = {Qian Long and Zihan Zhou and Abhinav Gupta and Fei Fang and Yi Wu and Xiaolong Wang}, title = {Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning}, booktitle = {ICLR}, year = {2020} }

About


Languages

Language:Python 100.0%