Skip to content

BigJob Tutorial Part 7: Multiple Pilot Jobs Example

pradeepmantha edited this page Aug 22, 2012 · 3 revisions

This page is part of the BigJob Tutorial.


Submitting jobs to multiple Pilot jobs

The script provides an example to submit jobs to multiple pilots. The Pilot-API manages the jobs across multiple pilots launched on same/different machines.

What is an example of the usage of multiple pilot jobs?

Pilot-Jobs are submitted to different schedulers on two different machines. Tasks are then executed on whichever machine becomes active first and can scale across machines (i.e. some tasks on Machine 1 and some tasks on Machine 2). This is just an example of the use of multiple pilot-jobs, but multiple pilots can be used across machines or on the same machine.

In your $HOME directory, open a new file mulitple_pilotjobs.py with your favorite editor (e.g., vim) and paste the following content:

import os
import time
import sys
from pilot import PilotComputeService, ComputeDataService, State
    
COORDINATION_URL = "redis://[email protected]:6379"
    	
### This is the number of jobs you want to run
NUMBER_JOBS=24
    
if __name__ == "__main__":
    
    pilot_compute_service = PilotComputeService(COORDINATION_URL)
    pilot_compute_description=[]
    
    pilot_compute_description.append({ "service_url": "sge://localhost",
                                       "number_of_processes": 12,
                                       "allocation": "XSEDE12-SAGA",
    	                               "queue": "development",
                                       "working_directory": os.getenv("HOME")+"/agent",
                                       "walltime":10,
                                     })
    
    pilot_compute_description.append({ "service_url": "sge://localhost",
                                       "number_of_processes": 12,
                                       "allocation": "XSEDE12-SAGA",
                                       "queue": "development",
                                       "working_directory": os.getenv("HOME")+"/agent",
                                       "walltime":10
                                     })
    
    for pcd in pilot_compute_description:
        pilot_compute_service.create_pilot(pilot_compute_description=pcd)
    
    compute_data_service = ComputeDataService()
    compute_data_service.add_pilot_compute_service(pilot_compute_service)
    
    print ("Finished Pilot-Job setup. Submitting compute units")
    
    # submit compute units
    for i in range(NUMBER_JOBS):
        compute_unit_description = {"executable": "/bin/echo",
                                    "arguments": ["Hello","$ENV1","$ENV2"],
                                    "environment": ['ENV1=env_arg1','ENV2=env_arg2'],
                                    "number_of_processes": 1,            
                                    "output": "stdout.txt",
                                    "error": "stderr.txt",
                                    }    
        compute_data_service.submit_compute_unit(compute_unit_description)
    
    print ("Waiting for compute units to complete")
    compute_data_service.wait()
    
    print ("Terminate Pilot Jobs")
    compute_data_service.cancel()    
    pilot_compute_service.cancel()

Execute the script using command

python mulitple_pilotjobs.py

Can you identify what jobs have been scheduled to which Pilot-Job?

You will have to go into the working directory (which is $HOME/agent in this case). For each Pilot-Job, a unique directory is created (i.e. bj-1### and bj-2###). The Pilot-Job directory contains a list of compute unit directories scheduled to that Pilot-Job.