-
Notifications
You must be signed in to change notification settings - Fork 3
Instructions: Using Singularity on the HPC
Singularity is a container software specifically designed for clusters. Application containers allow us to package software into a portable, shareable image. The ability to create a static image with all of the dependencies necessary for a given package or workflow allows us to control the environment in which we test, debug, and execute our code. For scientists, this is extremely useful.
Consider a experiment acquiring neuroimaging data over a long period of time. Given the amount of time it takes to process these data (e.g., Freesurfer alone normally takes ~12 hours), it only makes sense to process new subjects as they are acquired. However, even on HPCs, software packages are likely to be updated more than once within the lifespan of the project. This is a problem because it introduces time as a confound. Changes to the software over the lifespan of the experiment will necessarily induce time-related confounds to the processed data. Two common solutions to this are to either 1) process all of the data after it has been acquired or 2) use project-specific environments on the HPC to specify versions of individual software packages when running processing workflows. The former approach is inefficient (although it does prevent data peeking) in that it may cause substantial delays in analyzing the data after acquisition is complete, while the latter is not exactly secure, as changes on the HPC or unsupervised changes to the environment by lab members can affect results without users' knowledge. Container software like Singularity addresses the weaknesses in both of these approaches.
BIDS Apps are processing and analysis pipelines for neuroimaging data specifically designed to work on datasets organized in BIDS format. These pipelines are able to run on any datasets organized according to this convention (assuming they contain the requisite data, of course). Combined with application container software like Docker or Singularity, this means that the same pipeline will return the same results on the same dataset, no matter where or when you run it!
Moreover, because the majority of these pipelines have been developed by methodologists and have been evaluated in associated publications (e.g., Esteban et al., 2017; Craddock et al., 2013), they are likely to be of higher quality and better validated than pipelines developed in-lab (typically based on some in-lab dataset). Using independently-developed pipelines also reduces the ability and incentive of researchers to leverage the analytic flexibility inherent to neuroimaging data in order to p-hack (whether intentionally or not) their pipelines to produce the most appealing results in their data.
- SSH onto HPC login 1
module load singularity-3
singularity build [image name] docker://[docker_user|org]/[container]:[version tag]
-
Copy your data to
/scratch
. Your Singularity image can only access/scratch
and your home directory. -
Write a sub file for your job.
- An example sub file for processing data with a BIDS App.
- An example sub file for using a Singularity image as an environment. Not yet figured out, but more information available here.
-
Submit said job. E.g.,
bsub<job_file.sub
.