-
Notifications
You must be signed in to change notification settings - Fork 8
BigJob Tutorial Part 7: Multiple Pilot Jobs Example
This page is part of the BigJob Tutorial.
The script provides an example to submit jobs to multiple pilots. The Pilot-API manages the jobs across multiple pilots launched on same/different machines.
What is an example of the usage of multiple pilot jobs?
Pilot-Jobs are submitted to different schedulers on two different machines. Tasks are then executed on whichever machine becomes active first and can scale across machines (i.e. some tasks on Machine 1 and some tasks on Machine 2). This is just an example of the use of multiple pilot-jobs, but multiple pilots can be used across machines or on the same machine.
In your $HOME directory, open a new file mulitple_pilotjobs.py with your favorite editor (e.g., vim) and paste the following content:
import os
import time
import sys
from pilot import PilotComputeService, ComputeDataService, State
COORDINATION_URL = "redis://[email protected]:6379"
### This is the number of jobs you want to run
NUMBER_JOBS=24
if __name__ == "__main__":
pilot_compute_service = PilotComputeService(COORDINATION_URL)
pilot_compute_description=[]
pilot_compute_description.append({ "service_url": "sge://localhost",
"number_of_processes": 12,
"allocation": "XSEDE12-SAGA",
"queue": "development",
"working_directory": os.getenv("HOME")+"/agent",
"walltime":10,
})
pilot_compute_description.append({ "service_url": "sge://localhost",
"number_of_processes": 12,
"allocation": "XSEDE12-SAGA",
"queue": "development",
"working_directory": os.getenv("HOME")+"/agent",
"walltime":10
})
for pcd in pilot_compute_description:
pilot_compute_service.create_pilot(pilot_compute_description=pcd)
compute_data_service = ComputeDataService()
compute_data_service.add_pilot_compute_service(pilot_compute_service)
print ("Finished Pilot-Job setup. Submitting compute units")
# submit compute units
for i in range(NUMBER_JOBS):
compute_unit_description = {"executable": "/bin/echo",
"arguments": ["Hello","$ENV1","$ENV2"],
"environment": ['ENV1=env_arg1','ENV2=env_arg2'],
"number_of_processes": 1,
"output": "stdout.txt",
"error": "stderr.txt",
}
compute_data_service.submit_compute_unit(compute_unit_description)
print ("Waiting for compute units to complete")
compute_data_service.wait()
print ("Terminate Pilot Jobs")
compute_data_service.cancel()
pilot_compute_service.cancel()
Execute the script using command
python mulitple_pilotjobs.py
Can you identify what jobs have been scheduled to which Pilot-Job?
You will have to go into the working directory (which is $HOME/agent
in this case). For each Pilot-Job, a unique directory is created (i.e. bj-1### and bj-2###). The Pilot-Job directory contains a list of compute unit directories scheduled to that Pilot-Job.