Skip to content

AlyssaLytle/DecPOMDPGridworld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DecPOMDPGridworld

This directory contains problem examples meant for the MADP Toolbox.

Running Solver

Command to run solver without incremental expansion: ../MADP/src/solvers/GMAA --sparse --GMAA=MAAstar .dpomdp -h4

Command to run solver with incremental expansion: ../MADP/src/solvers/GMAA --sparse --GMAA=MAAstar --BGIP_Solver=BnB --BnB-ordering=Prob .dpomdp -h4

Example 1: 23gwsimple

Reward Map 1

This is a basic 2x3 gridworld example. Two agents are moving around a reward map (shown in the image). They both must make the same action for it to take effect. (e.g. to move right they both must choose "right" as their action.) They start at the top right corner and must solve to navigate.

Solution Policy

Solution Policy

Example 2: 23gw-rmap

Reward Maps 1 and 2 What the policy should look like

This is similar to Example 1 except now there are two reward maps. The agents are able to observe what reward map they are on and navigate accordingly.

Solution Policy

Solution Policy

Example 3: 23gw-comm

This expands on the previous examples. Now, for actions, they can either move or move AND communicate. Communication comes at a small cost. When they communicate, they are able to observe what reward map they are on.

Solution Policy

Solution Policy

Example 4: 23gw-nocomm

This is the same as the previous example, except the reward maps are different.

New Rmaps

Note that, regardless of which map the agents are on, there is always a path that doesn't hit an obstacle. This makes communication unnecessary. This is shown in the policy.

No Communication Policy

Example 5: 23gw-machine knows

In this example, we are back to the original two reward maps. The human is the one deciding movement in the environment and the machine is the one deciding whether or not to communicate. The machine knows everything about the environment. You can see that the machine decides to communicate and therefore the human traverses the obstacles correctly.

Human Policy

Machine Policy

Example 6: 23gw-sharedctrl

In this example, the machine knows everything and has the option to communicate or take control. The cost to communicate is -1 and the cost to take control is -5. The machine chooses to communicate and not take control.

human policy

machine policy

Example 7: 23gw-sharedctrl2

In this example, it's the same as before but the cost to communicate is -5 and the cost to take control is -1. In this one, the machine chooses to take control.

Machine takes control

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published