Skip to content

Job Descriptor

Andrey Ustyuzhanin edited this page Jun 7, 2015 · 1 revision

Job Descriptor example:

{
    "descriptor" : {
        "name" : "SHIP-MC.test",
        "env_container" : {
            "workdir" : "/opt/ship/FairShip/build",
            "name" : "anaderi/ocean:0.6.1",
            "needed_containers" : [
                {
                    "name" : "anaderi/ship-dev:0.1.0",
                    "volumes" : [
                        "/opt/ship"
                    ]
                }
            ]
        },
        "output_uri" : "local:/srv/skygrid/demo_result/$JOB_ID",
        "args" : {
            "--nEvents" : 10,
            "-f" : "$INPUT_DIR/Genie-mu+_nu_mu-gntp.113.gst_0.root",
            "--output" : "$OUTPUT_DIR/root",
            "--seed" : "$TIMEHASH",
            "--Genie" : true,
            "-Y" : 10
        },
        "cpu_per_container" : 1,
        "cmd" : "cd /opt/ship/FairShip/build; . ./config.sh; cp -r gconfig geometry python ..; export PYTHONPATH+=:/opt/ship/FairShip/build/python; python macro/run_simScript.py",
        "max_memoryMB" : 1024,
        "min_memoryMB" : 512
    },
 
    "input" : [
        "local:/srv/skygrid/barbara_splitted/mu/Genie-mu+_nu_mu-gntp.113.gst_0.root"
    ],
    "multiplier": 1
}

Here is a short description of the fields of the descriptor:

  • descriptor - main part of the Job, telling what to do
    • name - name of the job
    • env_container - container for running job inside
      • workdir - starting directory (change to it initially)
      • name - name of the container in the registry
      • needed_containers - containers that are required for running the job
    • output_uri - where to put results of the job & logs
    • cmd - command to run
    • args - arguments for the command
    • cpu_per_container - how many CPUs this job will require
    • max_memoryMB, min_memoryMB - memory requirements
  • input - list of input files that will be available to the job (path to those is stored in $INPUT_DIR)
  • multiplier - how many jobs of this kind you need to run

Variables you can use in JobDescriptor:

  • INPUT_DIR - directory where input files are located by time of job start
  • OUTPUT_DIR - directory where output files should go in order to be saved
  • JOB_ID - ID of the job generated by time of submission, can be treated as unique identifier

If your script doesn't support --output argument you can specify more complicated command:

"cd /opt/ship/FairShip/build; . ./config.sh; cp -r gconfig geometry python ..; export PYTHONPATH+=:/opt/ship/FairShip/build/python; python macro/ShipReco.py --inputFile /input/ship.10.0.Genie-TGeant4.root -n 10000 ; mv /input/ship.10.0.Genie-TGeant4_rec.root /output"

Clone this wiki locally