Examples in Java

The toolbox comes with example code which illustrates how problem instances can be solved for both MDPs and POMDPs. This page provides a more elaborate description of these examples. In addition to the examples based on built-in problem domains, we also explain how new problem domains can be defined.

Solving MDPs with constraints

The class executables.TestCMDP provides example code for solving an MDP problem instance with constraints. In the first few lines the LP solver and random generator are initialized. The LP solver LPSolve can be used by initializing LPSolverLPSolve rather than LPSolverGurobi.

LPSolver lpSolver = new LPSolverGurobi();
Random rnd = new Random(222);

The next step consists of generating a problem instance using one of the instance generators in the domains package. The code fragment below generates a problem instance with 2 agents and 10 sequential decisions based on the advertising domain.

CMDPInstanceGenerator generator = new AdvertisingInstanceGenerator();
int nAgents = 2;
int nDecisions = 10;
CMDPInstance instance = generator.getInstance(nAgents, nDecisions);

After generating the instance it is time to solve the instance using one of the algorithms in the toolbox. The code fragment below initializes an algorithm which uses the linear program for CMDPs. After that, it sets the problem instance generated previously by calling the setInstance method. Finally, it obtains a solution by calling the solve method of the algorithm.

CMDPAlgorithm alg = new ConstrainedMDP(lpSolver, rnd);
try {
    alg.setInstance(instance);
} catch (UnsupportedInstanceException e) {
    e.printStackTrace();
    System.exit(0);
}
CMDPSolution solution = alg.solve();

After computing a solution the resulting expected reward can be printed as illustrated below.

double expectedReward = solution.getExpectedReward();
System.out.println("Expected reward: "+expectedReward);

Finally, the computed solution can be evaluated through simulation. The code fragment below initializes a simulation environment using the CMDPSimulator class. Using this simulator it executes 1000000 simulation runs and it prints the mean reward obtained in these simulation runs.

CMDPSimulator sim = new CMDPSimulator(instance, solution, rnd);
sim.run(1000000);
System.out.println("Mean reward: "+sim.getMeanReward());

The example code in executables.TestCMDP shows how other problem domains and algorithms can be used.

Solving POMDPs with constraints

The example code for POMDPs with constraints can be found in executables.TestCPOMDP. It follows exactly the same structure as the example for MDPs, and additional explanations are therefore omitted.

LPSolver lpSolver = new LPSolverGurobi();
Random rnd = new Random(222);

CPOMDPInstanceGenerator generator = new CBMGenerator();	
int nAgents = 2;
int nDecisions = 10;
CPOMDPInstance instance = generator.getInstance(nAgents, nDecisions);

CPOMDPAlgorithm alg = new CGCP(lpSolver, rnd);
try {
    alg.setInstance(instance);
} catch (UnsupportedInstanceException e) {
    e.printStackTrace();
    System.exit(0);
}	
POMDPSolution solution = alg.solve();
				
double expectedReward = solution.getExpectedReward();
System.out.println("Expected reward: "+expectedReward);
		
CPOMDPSimulator sim = new CPOMDPSimulator(instance, solution, rnd);
sim.run(1000000);
System.out.println("Mean reward: "+sim.getMeanValue());
System.out.println("Mean cost: "+sim.getMeanCost());

Defining new domains and problem instances

New domains and problem instances can be defined by initializing CMDP and CPOMDP objects, as described in the description of these classes. The code example below provides two toy examples for problem instances with a budget constraint. The corresponding source code can be found in executables.ToyExamples. We start with defining the basic properties of an MDP model:

int nStates = 2;
int nActions = 2;
int nDecisions = 10;
int initialState = 0;

We define a reward function that only gives reward in case action 1 is executed in state 1:

double[][] rewardFunction = new double[nStates][nActions];
rewardFunction[1][1] = 10.0;

Next, we define a reward function in which action 0 always transitions to state 0. When executing action 1 in state 0, it transitions to state 1 with probability 0.9, and it remains in state 0 with probability 0.1. When executing action 1 in state 1 then the state remains unchanged.

int[][][] transitionDestinations = new int[nStates][nActions][];
double[][][] transitionProbabilities = new double[nStates][nActions][];

transitionDestinations[0][0] = new int[]{0};
transitionProbabilities[0][0] = new double[]{1.0};

transitionDestinations[0][1] = new int[]{0, 1};
transitionProbabilities[0][1] = new double[]{0.1, 0.9};

transitionDestinations[1][0] = new int[]{0};
transitionProbabilities[1][0] = new double[]{1.0};

transitionDestinations[1][1] = new int[]{1};
transitionProbabilities[1][1] = new double[]{1.0};

After defining the reward and transitions it is time to define a cost function. We define a function such that the cost is 2 in case action 1 is executed in state 1, and the cost is 0 otherwise.

double[][] costFunction = new double[nStates][nActions];
costFunction[1][1] = 2.0;
List<double[][]> costFunctions = new ArrayList<double[][]>();
costFunctions.add(costFunction);

Finally, we can construct a CMDP object and we define a problem instance with budget constraints, in which the total budget is 0.5. Note that we have just one agent in this example. Creating instances with multiple agents works similarly, except that it is required to create multiple CMDP objects. Adding more constraints corresponds to creating additional cost functions with associated budgets. Problem instances with instantaneous constraints can be defined using the method CMDPInstance.createInstantaneousInstance.

CMDP cmdp = new CMDP(nStates, nActions, costFunctions, initialState, nDecisions);
cmdp.setRewardFunction(rewardFunction);
cmdp.setTransitionFunction(transitionDestinations, transitionProbabilities);

CMDPInstance cmdpInstance = CMDPInstance.createBudgetInstance(new CMDP[]{cmdp}, new double[]{0.5}, nDecisions);

For POMDPs the problem instances can be defined in a similar fashion. Below we illustrate how a POMDP with full observability is created, which means that the observations correspond to state, and the initial belief assigns probability 1 to state 0.

int nObservations = 2;
double[][][] observationFunction = new double[nActions][nStates][nObservations];

for(int a=0; a<nActions; a++) {
    for(int sNext=0; sNext<nStates; sNext++) {
        int o = sNext;
        observationFunction[a][sNext][o] = 1.0;
    }
}

BeliefPoint b0 = new BeliefPoint(new double[]{1.0, 0.0});

CPOMDP cpomdp = new CPOMDP(nStates, nActions, nObservations, costFunctions, observationFunction, b0, nDecisions);
cpomdp.setRewardFunction(rewardFunction);
cpomdp.setTransitionFunction(transitionDestinations, transitionProbabilities);

CPOMDPInstance cpomdpInstance = CPOMDPInstance.createBudgetInstance(new CPOMDP[]{cpomdp}, new double[]{0.5}, nDecisions);

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.