-
Ensure you have Python 3.11 with:
python --version
-
(Optional) Set up a Python environment and activate it:
python -m venv env source env/bin/activate # On Windows use `env\Scripts\activate`
-
In the repository, install the required packages:
pip install -r requirements.txt
- Repairs (Motivating Example): A team of three agents must visit a headquarters (HQ) and then visit two communication stations in any order to make repairs. Agents must navigate around a hazardous region that prevents more than one agent from entering at a time.
- Cooperative Buttons: Agents must press a series of buttons in a particular order to reach a goal location. Traversing certain regions is only possible once the corresponding button has been pressed.
- Four-Buttons: Two agents must press four buttons (yellow, green, blue, red) in an environment, with an ordering constraint that the yellow button must be pressed before the red button.
- Cramped-Corridor: Two agents must navigate a small corridor to reach the pot at the end while avoiding collisions, then deliver the soup.
- Asymmetric-Advantages: Two agents are in separate rooms, each with access to a different set of resources, and must coordinate to deliver a soup.
- Four-Buttons: buttons_challenge
- Cooperative Buttons: easy_buttons
- Repairs: motivating_example
- Asymmetric Advantages: custom_island
- Cramped Corridor: interesting_cramped_room
python run.py --assignment_methods UCB --num_iterations 5 \
--wandb t --decomposition_file mono_interesting_cramped_room.txt \
--experiment_name interesting_cramped_room --is_monolithic f \
--env overcooked --render f --video f \
--add_mono_file mono_interesting_cramped_room.txt --num_candidates 10 \
--timesteps 1000000
--add_mono_file
: Remove this parameter if you don't want to add the monolithic embedding.--experiment_name
and decomposition file names: Change them usinginteresting_cramped_room
,custom_island
,easy_buttons
,buttons_challenge
,motivating_example
.--env
: Useovercooked
for Cramped-Corridor and Asymmetric-Advantages,buttons
for Four-Buttons, Cooperative Buttons, and Repairs.--num_candidates
: The number of decompositions to consider. Setting it to 1 means using the top decomposition by ATAD definition.--render
: Shows each timestep during execution.--video
: Saves videos of evaluation episodes.--timesteps
: Total training time per iteration.
If you change the decomposition file to individual_{exp_name}
and do not pass --num_candidates
, you can run the task training each agent on just the monolithic reward machine.