Bechmarks for Efficient Exploration of Multi-stage Tasks Completion and Utilization of Environmental Factors
SMAC-Exp Offense | SMAC-Exp Defense |
The StarCraft Multi-Agent Exploration Challenges : Learning Multi-Stage Tasks and Environmental Factors without Precise Reward Functions.
Mingyu Kim*, Jihwan Oh*, Yongsik Lee, Joonkee Kim, Seonghwan Kim, Song Chong, Se-Young Yun.
(*:equal contribution)
- Paper : https://ieeexplore.ieee.org/document/10099458
- Project page : https://osilab-kaist.github.io/smac_exp/
- Paperwithcodes : https://paperswithcode.com/paper/the-starcraft-multi-agent-challenges-learning
- Learning curves : https://url.kr/mak6gq
- Tensorboard logs and checkpoints : https://url.kr/92bp83
- PyMARL : The framework for deep multi-agent reinforcement learning with PyTorch.
- SMAC : The environments for research in the field of collaborative multi-agent reinforcement learning (MARL) based on Blizzard's StarCraft II RTS game. We work on based on these.
- SMAC_Exp : It provides its environment and exceutes training and testing RL algorithms based on PyMARL with both SMAC and SMAC_Exp.
-
SMAC is the standard benchmark of Multi-Agent Reinforcement Learning. It is mainly concerned with ensuring that all agents cooperatively eliminate approaching adversaries only through fine manipulation with obvious reward functions.
-
SMAC_Exp contains total 8 maps and categorizes into three types like
defense
,offense
,challenging
. It is interested in the exploration capability of MARL algorithms to efficiently learn implicitly multi-stage tasks and environmental factors as well as micro-control.
Main Issues | SMAC | SMAC_Exp |
---|---|---|
Agents micro-control | O | O |
Multi-stage tasks | ▵ | O |
Environmental factors | ▵ | O |
- SG, Mar and M refer each Siege Tank, Marauder, Marine units.
Name | Ally Units | Enemy Units | Opponents approach |
---|---|---|---|
defense_infantry |
1 Mar & 4 M | 1 Mar & 6 M | One-sided |
defense_armored |
1 SG Tank, 1 Tank, 1 Mar & 5 M | 2 Tank, 2 Mar & 9 M | Two-sided |
defense_outnumbered |
1 SG Tank, 1 Tank, 1 Mar & 5 M | 2 Tank, 3 Mar & 10 M | Two-sided |
Name | Ally Units | Enemy Units | Distance & formation |
---|---|---|---|
offense_near |
3 SG Tank, 3 Tank, 3 Mar & 4 M | 1 SG Tank, 2 Tank, 2 Mar & 4 M | Near & Spread |
offense_distant |
3 SG Tank, 3 Tank, 3 Mar & 4 M | 1 SG Tank, 2 Tank, 2 Mar & 4 M | Distant & Spread |
offense_complicated |
3 SG Tank, 3 Tank, 3 Mar & 4 M | 1 SG Tank, 2 Tank, 2 Mar & 4 M | Complicated & Spread |
Name | Ally Units | Enemy Units | Opponents approach |
---|---|---|---|
defense_superhard |
1 SG Tank, 1 Tank, 1 Mar & 5 M | 2 Tank, 3 Mar & 10 M | Two-sided |
Name | Ally Units | Enemy Units | Distance & formation |
---|---|---|---|
offense_superhard |
1 SG Tank, 2 Tank, 2 Mar & 4 M | 1 SG Tank, 2 Tank, 2 Mar & 4 M | Complicated & Gathered |
Algorithm | Category | Paper Links |
---|---|---|
IQL |
Value based | paper |
VDN |
Value based | paper |
QTRAN |
Value based | paper |
QMIX |
Value based | paper |
DIQL |
Distributional Value based | paper |
DDN |
Distributional Value based | paper |
DMIX |
Distributional Value based | paper |
DRIMA |
Distributional Value based | paper |
COMA |
Policy Gradient based | paper |
MASAC |
Policy Gradient based | paper |
MADDPG |
Policy Gradient based | paper |
MAPPO |
Policy Gradient based | paper |
- Please pay attention to the version of SC2 you are using for your experiments.
- Performance is *not* always comparable between versions.
- The results in SMAC (https://arxiv.org/abs/1902.04043) use SC2.4.6.2.69232 not SC2.4.10.
- wget http://blzdistsc2-a.akamaihd.net/Linux/SC2.4.6.2.69232.zip
1️⃣ Cloning SMAC_Exp
git clone https://github.com/osilab-kaist/smac_plus.git
2️⃣ Download and set up StarCraft II
bash install_sc2.sh
- This will download SC2 into the pymarl/3rdparty
folder, or using symbolic link to use SC2.
3️⃣ Install required packages
- The requirements.txt
file can be used to install the necessary packages into a virtual environment (not recommended).
- After install requirements, install torch
suitable for the environment.
4️⃣ Move map directoryes to StarCraftII map directory
- Move SMAC_Maps
/ SMAC_Plus_Maps
directories to StarCraftII/Maps/
.
mv SMAC_Plus_Maps ./pymarl/3rdparty/StarCraftII/Maps/
mv SMAC_Maps ./pymarl/3rdparty/StarCraftII/Maps/
- You should have a structure like these:
smac_exp
├── pymarl
│ ├── docker
│ ├── 3rdparty
│ │ └── StarCraftII
│ │ ├── Maps
│ │ │ ├── SMAC_Maps
│ │ │ └── SMAC_Plus_Maps
│ │ └── ...
│ ├── src
│ └── results
├── smac_plus
├── requirements.txt
└── install_sc2.sh
- Episode experience buffer
cd ./pymarl
python src/main.py --alg=qmix --env-config=smac_plus with env_args.map_name=offense_hard
python src/main.py --alg=qmix --env-config=smac with env_args.map_name=2s3z
- Parallel experience beffer
cd ./pymarl
python src/main.py --alg=qmix --env-config=smac_plus with env_args.map_name=offense_hard runner=parallel batch_size_run=20
python src/main.py --alg=qmix --env-config=smac with env_args.map_name=2s3z runner=parallel batch_size_run=20
- The config files act as defaults for an algorithm or environment.
- They are all located in src/config
.
- --config
refers to the config files in src/config/algs
.
- --env-config
refers to the config files in src/config/envs
.
- All results will be stored in the results
folder.
cd ./pymarl
python src/main.py --alg=qmix --env-config=smac_plus with env_args.map_name=offense_hard save_replay=True checkpoint_path={checkpoint_dir_path} load_step={n_steps} test_nepisode={n_test}
python src/main.py --alg=qmix --env-config=smac with env_args.map_name=2s3z save_replay=True checkpoint_path={checkpoint_dir_path} load_step={n_steps} test_nepisode={n_test}
- While we are developing a benchmark from Feb., 2022, we encountered an unexpectable difficulty like deleting stored result files owing to malfunction of computation resources.
- Therefore, despite of restroing these files, we provides partial information of pretrained checkpoints and tensorboard logs. Please refer to this URL(https://url.kr/92bp83)
- Instead, we completely provide training curves of all algorithms and its test scores. Please see the provided csv files (https://url.kr/mak6gq).
Par : Parallel experience buffer
Seq : Sequential experience buffer
O : Exsistence of tensorboard logs and checkpoints.
▵ : Exsitence of either tensorboard logs or checkpoints.
X : Absence of tensorboard logs and checkpoints.
(Par)Def_infantry | (Par)Def_armored | (Par)Def_outnumbered | |||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
IQL | O | O | O | O | O | O | O | O | O |
VDN | - | - | - | - | - | - | - | - | - |
QMIX | - | - | - | - | - | - | - | - | - |
QTRAN | - | - | - | - | - | - | - | - | - |
COMA | O | O | O | O | O | O | O | O | O |
MASAC | O | O | O | O | O | O | O | O | O |
MAPPO | - | - | - | - | - | - | - | - | - |
DIQL | O | O | O | O | O | O | O | O | O |
DDN | - | - | - | - | - | - | - | - | - |
DMIX | - | - | - | - | - | - | - | - | - |
DRIMA | - | - | - | - | - | - | - | - | - |
(Seq)Def_infantry | (Seq)Def_armored | (Seq)Def_outnumbered | |||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
IQL | O | O | O | O | O | O | O | O | O |
VDN | O | O | O | O | O | O | - | - | - |
QMIX | O | O | O | O | O | - | O | O | - |
QTRAN | O | O | O | O | O | O | - | - | - |
COMA | O | O | - | O | O | O | O | O | O |
MADDPG | O | O | - | O | O | - | O | O | O |
MASAC | O | O | O | O | O | O | O | O | O |
MAPPO | - | - | - | - | - | - | - | - | - |
DIQL | O | O | O | O | O | O | O | O | O |
DDN | O | O | O | O | O | O | O | O | O |
DMIX | O | O | O | O | O | O | O | O | O |
DRIMA | O | O | - | O | ▵(Model) | - | O | ▵(Model) | - |
(Par)Off_near | (Par)Off_distant | (Par)Off_complicated | |||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
IQL | O | O | O | O | O | O | O | O | O |
VDN | - | - | - | - | - | - | - | - | - |
QMIX | - | - | - | - | - | - | - | - | - |
QTRAN | - | - | - | - | - | - | - | - | - |
COMA | O | O | O | O | O | O | O | O | O |
MASAC | O | O | O | O | O | O | O | O | O |
MAPPO | - | - | - | - | - | - | - | - | - |
DIQL | O | O | O | O | O | O | O | O | O |
DDN | - | - | - | - | - | - | - | - | - |
DMIX | - | - | - | - | - | - | - | - | - |
DRIMA | - | - | - | - | - | - | - | - | - |
(Seq)Off_near | (Seq)Off_distant | (Seq)Off_complicated | |||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
IQL | - | - | - | - | - | - | - | - | - |
VDN | - | - | - | - | - | - | - | - | - |
QMIX | ▵(Model) | ▵(Model) | - | O | - | O | O | O | - |
QTRAN | - | - | - | - | - | - | - | - | - |
COMA | O | O | O | O | O | O | - | - | - |
MADDPG | O | O | O | - | - | - | O | O | - |
MASAC | - | - | - | - | - | - | - | - | - |
MAPPO | - | - | - | - | - | - | - | - | - |
DIQL | - | - | - | - | - | - | - | - | - |
DDN | - | - | - | - | - | - | - | - | - |
DMIX | - | - | - | - | - | - | - | - | - |
DRIMA | - | ▵(Model) | O | O | O | - | O | O | O |
(Par)Off_hard | (Par)Off_superhard | |||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | |
IQL | O | O | O | O | O | O |
VDN | O | O | O | ▵(Model) | O | O |
QMIX | O | O | O | O | O | O |
QTRAN | O | O | O | - | - | - |
COMA | O | O | O | O | O | O |
MASAC | O | O | O | O | O | O |
MAPPO | - | - | - | - | - | - |
DIQL | O | O | O | O | O | O |
DDN | O | O | O | - | - | - |
DMIX | O | O | O | O | O | O |
DRIMA | - | - | - | - | - | - |
(Seq)Off_hard | (Seq)Off_superhard | |||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | |
IQL | - | - | - | - | - | - |
VDN | - | - | - | - | - | - |
QMIX | O | O | O | O | O | O |
QTRAN | - | - | - | - | - | - |
COMA | O | O | O | O | O | O |
MADDPG | - | - | - | O | O | O |
MASAC | - | - | - | - | - | - |
MAPPO | - | - | - | - | - | - |
DIQL | - | - | - | O | O | O |
DDN | - | - | - | O | O | O |
DMIX | - | - | - | O | O | O |
DRIMA | ▵(Model) | ▵(Model) | ▵(Model) | O | O | O |
- The original SMAC environment and PyMARL code follow the MIT license and Apache 2.0 license respectively. The proposed SMAC-Exp environment and the modified PyMARL code are also released under the MIT license and Apache 2.0 license each.
@ARTICLE{10099458,
author={Kim, Mingyu and Oh, Jihwan and Lee, Yongsik and Kim, Joonkee and Kim, Seonghwan and Chong, Song and Yun, Seyoung},
journal={IEEE Access},
title={The StarCraft Multi-Agent Exploration Challenges: Learning Multi-Stage Tasks and Environmental Factors Without Precise Reward Functions},
year={2023},
volume={11},
number={},
pages={37854-37868},
doi={10.1109/ACCESS.2023.3266652}}