Decomposing Temporal Equilibrium Strategy for Coordinated Distributed Multi-Agent Reinforcement Learning(AAAI2024)
The increasing demands for system complexity and robustness have prompted the integration of temporal logic into Multi-Agent Reinforcement Learning (MARL) to address tasks with non-Markovian properties. However, incorporating non-Markovian properties introduces additional computational complexities, as agents are required to integrate historical data into their decision-making process. Also, optimizing strategies within a multi-agent environment presents significant challenges due to the exponential growth of the state space with the number of agents. In this study, we introduce an innovative hierarchical MARL framework that synthesizes temporal equilibrium strategies through parity games and subsequently encodes them as individual reward machines for MARL coordination. More specifically, we reduce the strategy synthesis problem into an emptiness problem concerning parity games with optimized states and transitions. Following this synthesis step, the temporal equilibrium strategy is decomposed into individual reward machines for decentralized MARL. Theoretical proofs are provided to verify the consistency of the Nash equilibrium between the parallel composition of decomposed strategies and the original strategy. Empirical evidence confirms the efficacy of the proposed synthesis technique, showcasing its ability to reduce state space compared to EVE. Furthermore, our study highlights the superior performance of the distributed MARL paradigm over centralized approaches when deploying decomposed strategies.
MATEA has been tested on Ubuntu 18.04 and macOS monterey:
The code has the following requirements:
- Python 3.6 or 3.7
- NumPy
- OpenAI Gym
- OpenAI Baselines
- OPAM (https://opam.ocaml.org/doc/Install.html) + OCaml version 4.03.x or later (https://ocaml.org/docs/install.html).
To install OPAM (along with OCaml):
- Ubuntu
sudo apt-get install m4
sudo wget https://raw.github.com/ocaml/opam/master/shell/opam_installer.sh -O - | sh -s /usr/local/bin
echo "y" | opam init
eval `opam config env`
- Ubuntu
- Cairo (https://cairographics.org/download/) or from sourcecode (https://cairographics.org/releases/). To install Cairo:
- Ubuntu
sudo apt-get install libcairo2-dev
sudo apt-get install python-cairo
- Ubuntu
- IGraph version 0.7 (http://igraph.org/python/)
- You need to have a C/C++ compiler installed on your machine. To install
- Ubuntu
sudo apt-get install python-igraph
From inside folder synthesis execute the following command:
$ python main.py [path/name of the file] [options]
-
List of optional arguments:
-d
Option to draw the synthesized strategies-v
Option to record performance of the tool-s
Option to save the synthesized strategy as well as the decomposed strategies for MARL to use -
Example:
- Generate the synthesized strategy and decomposed strategies for the 2 agent environment:
$ python main.py ../examples/cop_2agent -d
, draw the synthesized strategies and the decomposed strategies
-Record the time taken for 2 agent using the tool to record performance:
$ python main.py ../examples/cop_2agent -v
, record the performance of the tool - Generate the synthesized strategy and decomposed strategies for the 2 agent environment:
The generation results of the synthesized strategy and decomposed strategies that satisfy the Nash equilibrium can be found in the folder results.
From inside folder marl execute the following command:
$ python run.py --alg=<name of the algorithm> --env=<environment_id> [additional options]
-
Example:
-
Decentralized MARL traning with 2 agent environment
$ python run.py --alg=maqlearning --env=MACraft-2agentdcent-v0 --num_timesteps=1e7 --gamma=0.9 --log_path=../aaai/2agent/dcent/M1/1 --ma --num_agent=2 --dcent
-
Centralized MARL traning with 2 agent environment
$ python run.py --alg=maqlearning --env=MACraft-2agentcent-v0 --num_timesteps=1e7 --gamma=0.9 --log_path=../aaai/2agent/dcent/M1/1 --ma --num_agent=2
-
From inside folder marl/scripts are the scripts run for the paper.Execute the following command:
$ ./[name of the file]
-
Example:
- To run the script for the 2 agent environment
$ ./run_2agent.sh
- To run the script for the 2 agent environment
The results generated by the MARL with the synthesized strategies and decomposed strategies can be found in the folder marl/results
Finally, note that we included code that allows you to manually play each environment. Go to the reward_machines folder and run the following command:
$ python play.py --env=<environment_id>
- Examples:
- Play one of the 2 agent environments :
$ python play.py --env=MACraft-2agentdcent-v0
- Play one of the 2 agent environments :
where <environment_id>
can be found in the folder marl/reward_machines/envs/init.py
Several files in our implementation adapt code originally included:
EVE: https://github.com/eve-mas/eve-parity.
Reward Machines: https://github.com/RodrigoToroIcarte/reward_machines
We thank the authors of their work.
If you use MATEA tool, please cite the following work:
- MATEA [PDF]
@inproceedings{zhu2024decomposing,
title={Decomposing Temporal Equilibrium Strategy for Coordinated Distributed Multi-Agent Reinforcement Learning},
author={Zhu, Chenyang and Si, Wen and Zhu, Jinyu and Jiang, Zhihao},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={16},
pages={17618--17627},
year={2024}
}