Skip to content

XSEDE Machine Specific Tips

melrom edited this page Nov 19, 2012 · 31 revisions

This page provides both general and specific tips for running on XSEDE infrastructure. General information is provided first, and then tips are listed by machine name (i.e. Lonestar and Ranger, Kraken, etc). If you are interested in running on a specific machine, please scroll down until you see the machine name.

If you do not see a particular machine name, BigJob may run on this machine but not be supported yet in the documentation. Please feel free to email [email protected] to request machine information to be added.

General

Where to Run

In general, on XSEDE machines, the tutorials can be run out of your $HOME directory, but production-grade science should be done in either the $SCRATCH or $WORK directories on the machine. This means you will run your BigJob script and make your BigJob agent directory in either $SCRATCH or $WORK and NOT in $HOME.

Adding your Allocation

When creating BigJob scripts for XSEDE machines, it is necessary to add the allocation field to the pilot_compute_description.

"allocation": "TG-XXXXXXXX"

TG-XXXXX must be replaced with your individual allocation SU number as provided to you by XSEDE.

Lonestar and Ranger

Installation of a virtual environment requires the use of python 2.7.x. In order to load Python 2.7.x before installing the virtual environment, please execute:

module load python

Then you can proceed with the tutorial, and make sure that you activate your virtual environment in your .bashrc before you try to run BigJob.

You will need to put the following two lines in both your .bashrc and your .bash_profile in order to run on Lonestar and Ranger. This is due to the fact that interactive shells source a different file than regular shells.

    module load python
    source $HOME/<tutorial>/bin/activate

Kraken

Load Proper Python Environment

Before installing your virtual environment, you must do a module load python on Kraken to ensure you're using Python 2.7.x instead of the system-level Python.

Using Lustre Scratch

Prior to running code on Kraken, you will need to make a directory called 'agent' in the same location that you are running your scripts from. The BigJob agent relies on aprun to execute subjobs. aprun works only if the working directory of the BigJob and subjobs is set to the scratch space of Kraken.

In this tutorial, you can create the agent directory in /lustre/scratch/<username> by typing:

cd /lustre/scratch/<username> *replace with your Kraken username

mkdir agent

Activate your Credentials

To submit jobs to Kraken from another resource using gsissh, the use of myproxy is required. To start a myproxy server, execute the following command:

myproxy-logon -T -t <number of hours> -l <your username>

You need to use your XSEDE portal username and password. To verify that your myproxy server is running, type grid-proxy-info.

If it was successful, you should see a valid proxy running.

Aprun

BigJob on Kraken is subject to several limitations that arise from the system use of aprun:

  • It is only possible to run one sub-job per node (Please consider this in your job specification)

  • Only one instance of aprun can be executed at a time. It is possible to batch subjobs by using the following environment setting in the job description:

    jd = saga.job.description()
    jd.executable = "/bin/date"
    jd.number_of_processes = "1" 
    jd.spmd_variation = "single" 
    jd.arguments = [""]
    jd.environment=["NUMBER_SUBOBS=2")]
    jd.output = "bfast-stdout.txt"
    jd.error = "bfast-stderr.txt"

This will run 2 /bin/date tasks at the same time:

aprun -n 2 -d 1 /bin/date