FlickerNiko / SAC-QMIX

Algorithm that combines QMIX with SAC for Multi-Agent Reinforcement Learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SAC-QMIX

Algorithm that applies SAC to QMIX for Multi-Agent Reinforcement Learning. Watch the demo here.

Requirements

SMAC

pytorch (GPU support recommanded while training)

tensorboard

StarCraft II

For the installation of SMAC and StarCraft II, refer to the repository of SMAC.

Train

Train a model with the following command:

python main.py

Configurations and parameters of the training are specified in config.json. Models will be saved at ./models

Test

Test a trained model with the following command:

python test_model.py

Configurations and parameters of the testing are specified in test_config.json. Match the run_name items in config.json and test_config.json.

Theory & Algorithm

Architecture

Computation Flow

Note that a_i is equivalent to \mu_i and s_i is equivalent to o_i in the architecture schema above.

Train Objective: policies that maximum

Q-values computed by networks:

Individual state-value functions:

Total state-values (alpha is the entropy temperature):

Q-values expressed with Bellman Function:

Critic networks update: minimum

Actor networks update: maximum

Entropy temperatures update: minimum

Result

Note that data of other algorithm are from SMAC paper. Therefore methods of evaluations are kept the same as SMAC paper did (StarCraftII version: SC2.4.6.2.69232).

Test Win Rate % of SAC-QMIX and other algorithms

(Mean of 5 independent runs)

Scenario IQL VDN QMIX SAC-QMIX
2s_vs_1sc 100 100 100 100
2s3z 75 97 99 100
3s5z 10 84 97 97
1c3s5z 21 91 97 100
10m_vs_11m 34 97 97 100
2c_vs_64zg 7 21 58 56
bane_vs_bane 99 94 85 100
5m_vs_6m 49 70 70 90
3s_vs_5z 45 91 87 100
3s5z_vs_3s6z 0 2 2 85
6h_vs_8z 0 0 3 82
27m_vs_30m 0 0 49 100
MMM2 0 1 69 95
corridor 0 0 1 0

Learning curves of SAC-QMIX and other algorithms

(Mean of 5 independent runs)

About

Algorithm that combines QMIX with SAC for Multi-Agent Reinforcement Learning.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%