-
Notifications
You must be signed in to change notification settings - Fork 8
XSEDE Machine Specific Tips
This page provides both general and specific tips for running on XSEDE infrastructure. General information is provided first, and then tips are listed by machine name (i.e. Lonestar and Ranger, Kraken, etc). If you are interested in running on a specific machine, please scroll down until you see the machine name.
If you do not see a particular machine name, BigJob may run on this machine but not be supported yet in the documentation. Please feel free to email [email protected]
to request machine information to be added.
In general, on XSEDE machines, the tutorials can be run out of your $HOME
directory, but production-grade science should be done in either the $SCRATCH
or $WORK
directories on the machine. This means you will run your BigJob script and make your BigJob agent
directory in either $SCRATCH or $WORK and NOT in $HOME.
When creating BigJob scripts for XSEDE machines, it is necessary to add the allocation
field to the pilot_compute_description
.
"allocation": "TG-XXXXXXXX"
TG-XXXXX must be replaced with your individual allocation SU number as provided to you by XSEDE.
Installation of a virtual environment requires the use of python 2.7.x. In order to load Python 2.7.x before installing the virtual environment, please execute:
module load python
Then you can proceed with the tutorial, and make sure that you activate your virtual environment in your .bashrc
before you try to run BigJob.
You will need to put the following two lines in both your .bashrc
and your .bash_profile
in order to run on Lonestar and Ranger. This is due to the fact that interactive shells source a different file than regular shells.
module load python
source $HOME/<tutorial>/bin/activate
Before installing your virtual environment, you must do a module load python
on Kraken to ensure you're using Python 2.7.x instead of the system-level Python.
Prior to running code on Kraken, you will need to make a directory called 'agent' in the same location that you are running your scripts from. The BigJob agent relies on aprun to execute subjobs. aprun works only if the working directory of the BigJob and subjobs is set to the scratch space of Kraken.
In this tutorial, you can create the agent directory in /lustre/scratch/<username>
by typing:
cd /lustre/scratch/<username>
*replace with your Kraken username
mkdir agent
To submit jobs to Kraken from another resource using gsissh, the use of myproxy is required. To start a myproxy server, execute the following command:
myproxy-logon -T -t <number of hours> -l <your username>
You need to use your XSEDE portal username and password. To verify that your myproxy server is running, type grid-proxy-info
.
If it was successful, you should see a valid proxy running.
BigJob on Kraken is subject to several limitations that arise from the system use of aprun:
-
It is only possible to run one sub-job per node (Please consider this in your job specification)
-
Only one instance of aprun can be executed at a time. It is possible to batch subjobs by using the following environment setting in the job description:
jd = saga.job.description()
jd.executable = "/bin/date"
jd.number_of_processes = "1"
jd.spmd_variation = "single"
jd.arguments = [""]
jd.environment=["NUMBER_SUBOBS=2")]
jd.output = "bfast-stdout.txt"
jd.error = "bfast-stderr.txt"
This will run 2 /bin/date tasks at the same time:
aprun -n 2 -d 1 /bin/date
-
A minimum of two nodes on Kraken are required (i.e. -l size=24).
-
aprun runs only on lustre scratch directories as home directories: http://www.nics.tennessee.edu/computing-resources/kraken/running-jobs