Skip to content

This study integrates temporal logic into MARL, addressing non-Markovian tasks and state-space challenges. It synthesizes strategies using parity games, encoding them into reward machines for decentralized coordination. Results show reduced state-space and improved performance over centralized methods.

Notifications You must be signed in to change notification settings

Gabriel0402/MATEA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decomposing Temporal Equilibrium Strategy for Coordinated Distributed Multi-Agent Reinforcement Learning(AAAI2024)

The increasing demands for system complexity and robustness have prompted the integration of temporal logic into Multi-Agent Reinforcement Learning (MARL) to address tasks with non-Markovian properties. However, incorporating non-Markovian properties introduces additional computational complexities, as agents are required to integrate historical data into their decision-making process. Also, optimizing strategies within a multi-agent environment presents significant challenges due to the exponential growth of the state space with the number of agents. In this study, we introduce an innovative hierarchical MARL framework that synthesizes temporal equilibrium strategies through parity games and subsequently encodes them as individual reward machines for MARL coordination. More specifically, we reduce the strategy synthesis problem into an emptiness problem concerning parity games with optimized states and transitions. Following this synthesis step, the temporal equilibrium strategy is decomposed into individual reward machines for decentralized MARL. Theoretical proofs are provided to verify the consistency of the Nash equilibrium between the parallel composition of decomposed strategies and the original strategy. Empirical evidence confirms the efficacy of the proposed synthesis technique, showcasing its ability to reduce state space compared to EVE. Furthermore, our study highlights the superior performance of the distributed MARL paradigm over centralized approaches when deploying decomposed strategies.

Installation instructions

MATEA has been tested on Ubuntu 18.04 and macOS monterey:

The code has the following requirements:

How to run the code

Temporal Equilibrium Strategy Synthesis

From inside folder synthesis execute the following command: $ python main.py [path/name of the file] [options]

  • List of optional arguments:

    -d Option to draw the synthesized strategies -v Option to record performance of the tool -s Option to save the synthesized strategy as well as the decomposed strategies for MARL to use

  • Example:

    • Generate the synthesized strategy and decomposed strategies for the 2 agent environment: $ python main.py ../examples/cop_2agent -d, draw the synthesized strategies and the decomposed strategies

    -Record the time taken for 2 agent using the tool to record performance: $ python main.py ../examples/cop_2agent -v, record the performance of the tool

The generation results of the synthesized strategy and decomposed strategies that satisfy the Nash equilibrium can be found in the folder results.

MARL with Synthesized Strategies and Decomposed Strategies

From inside folder marl execute the following command: $ python run.py --alg=<name of the algorithm> --env=<environment_id> [additional options]

  • Example:

    • Decentralized MARL traning with 2 agent environment $ python run.py --alg=maqlearning --env=MACraft-2agentdcent-v0 --num_timesteps=1e7 --gamma=0.9 --log_path=../aaai/2agent/dcent/M1/1 --ma --num_agent=2 --dcent

    • Centralized MARL traning with 2 agent environment $ python run.py --alg=maqlearning --env=MACraft-2agentcent-v0 --num_timesteps=1e7 --gamma=0.9 --log_path=../aaai/2agent/dcent/M1/1 --ma --num_agent=2

From inside folder marl/scripts are the scripts run for the paper.Execute the following command: $ ./[name of the file]

  • Example:

    • To run the script for the 2 agent environment $ ./run_2agent.sh

The results generated by the MARL with the synthesized strategies and decomposed strategies can be found in the folder marl/results

Playing each environment

Finally, note that we included code that allows you to manually play each environment. Go to the reward_machines folder and run the following command: $ python play.py --env=<environment_id>

  • Examples:
    • Play one of the 2 agent environments : $ python play.py --env=MACraft-2agentdcent-v0

where <environment_id> can be found in the folder marl/reward_machines/envs/init.py

Acknowledgments

Several files in our implementation adapt code originally included:

EVE: https://github.com/eve-mas/eve-parity.

Reward Machines: https://github.com/RodrigoToroIcarte/reward_machines

We thank the authors of their work.

Citations

If you use MATEA tool, please cite the following work:

@inproceedings{zhu2024decomposing,
  title={Decomposing Temporal Equilibrium Strategy for Coordinated Distributed Multi-Agent Reinforcement Learning},
  author={Zhu, Chenyang and Si, Wen and Zhu, Jinyu and Jiang, Zhihao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={16},
  pages={17618--17627},
  year={2024}
}

About

This study integrates temporal logic into MARL, addressing non-Markovian tasks and state-space challenges. It synthesizes strategies using parity games, encoding them into reward machines for decentralized coordination. Results show reduced state-space and improved performance over centralized methods.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages