Pipeline to process anatomical data, including microstructure metrics from DWI and MT data
Instructions: https://docs.cneuromod.ca/en/latest/ACCESS.html#downloading-the-dataset
Repository: https://github.com/courtois-neuromod/cneuromod
Then, set them:
export AWS_ACCESS_KEY_ID=<s3_access_key>
export AWS_SECRET_ACCESS_KEY=<s3_secret_key>
datalad install -r [email protected]:courtois-neuromod/anat.git
cd anat
datalad get .
The requirements for this pipeline include:
- Nextflow and Docker for brain data analysis
- SCT for spinal cord data analysis
- Make sure that java
8
or later is installed
java -version
Note: version numbers
1.8.y_z
and8
identify the Java release.
- Enter this command in your terminal to install Nextflow:
curl -s https://get.nextflow.io | bash
After the installation an executable named
nextflow
will be created in the directory where you called the command. You can simply copy this to a directory in your system$PATH
to be able to callnextflow
from any directory.
- Run a simple demo
nextflow run hello
These Docker images will be orchestrated by the workflow to deal with dependencies:
docker pull qmrlab/minimal:v2.5.0b
docker pull qmrlab/antsfsl:latest
If you are an OSX user and manually configured Docker to run in a virtual machine, please make sure that your
/Users
folder is mounted into the Docker VM. If you installed Docker using the Desktop Installer, this is automatically configured.
See SCT installation instructions.
nextflow run neuromod-process-anat.nf --bids /path/to/courtois-neuromod/directory -with-report report.html
Upon invocation within a directory (in this case it is where you cloned neuromod/anat-processing
), nextflow creates a
set of project specific files/folders to cache outputs with detailed provenance recording:
.nextflow
folder that contains caching information*.log
workflow execution logswork
folder that contains all the Nextflow outputs and traces.report.html
workflow execution report
All these files are gitignored, so you are not likely to push any of these files by mistake. However, you can change the location where these outputs are saved to keep the source directory clean.
Using the following nextflow arguments, you can configure where the interim files will be stored:
COMMAND VALUE
-w /working/directory
-log /execution/log-file.log
Example:
nextflow -log $TMPDIR/cneuromod.log run neuromod-process-anat.nf --bids /path/to/courtois-neuromod/directory -w $TMPDIR -with-report /select/direcroty/for/report.html
Note that environment variable storing the temporary directory depends on your operation system:
- OSX:
$TMPDIR
- Ubuntu:
$TMP
- Windows:
%TMP%
Nonetheless, you can set any directory where you have write access to.
In this case, only .nextflow
folder will pop up in the source directory. Resultant work dorectory be displayed in the
terminal along with a random Mnemonic (e.g. sad_ampere
). To clean interim workflow outputs, you can run the following
in the terminal:
nextflow clean mnemonic_name -n
this command will show you the list of files that'll be deleted if you run
nextflow clean mnemonic_name -f
Note that if you delete these files and would like to resume nextflow after an interrupted run you won't be able to recover the processed files.
You can invoke the nextflow pipeline using datalad
so that the process appears in dataset history:
datalad run -m "add some description" -d /path/to/courtois-neuromod/directory -o /path/to/courtois-neuromod/directory/derivatives "nextflow -log $TMPDIR/nf.log run neuromod-process-anat.nf --bids /path/to/courtois-neuromod/directory -w $TMPDIR/NF -with-report report.html"
Note that extensive provenance recording is also captured by Nextflow. You are highly encouraged to save report.html
and inspect it after each workflow run.
For all the processes the current allocation (nextflow.config
) is:
- 1 CPU
- 1GB RAM
Except for aligning input images, for which 2CPUs are allocated.
Nextflow infers parallelism at the subject/session
level, so the number of maximum parallel operations depend on
your processor specs & how many of them are requested per task. You can change nextflow.config
to optimize resources
according to your system's capacity. If you would like to limit maximum number of parallel operations for a process,
you can set maxForks
property. For example:
...
withName: alignMtsatInputs {
cpus = 2
memory = 2.GB
maxForks = 4
container = 'qmrlab/antsfsl:latest'
}
...
By default maxForks
value is equal to the number of CPU cores available minus 1.
The local executor is used by default. It runs the pipeline processes in the computer where Nextflow is launched. The processes are parallelised by spawning multiple threads and by taking advantage of multi-cores architecture provided by the CPU.
Resources allocated to running containers are governed by your Docker settings. For example, if you need 500MB memory for a process, but if you set Docker memory access limit to 200MB, the process will run out of memory and fail to proceed. Note that this allocation affects performance as well.
You can use any of these available executors.
- By default, this workflow is configured to work with multiple containers. However, you can edit
nextflow.config
for a select process to run locally. In that case, you need to make sure that all the dependencies are met for that process. - All the processes expect certain outputs to be emitted. Exceptions will interrupt the workflow run.
- A
subject/session
process will be omitted if any of the configured inputs are missing for thatsubject/session
. For example, if you setuse_b1cor=true
the whole process will be skipped for asubject/session
missing../fmap/...B1plusmap.nii.gz
. This ensures that all the derivatives are processed uniformly.
@Basile
@Arnaud
sct_run_batch -path-data <PATH_NEUROMOD_DATA> -path-output <PATH_OUTPUT> -job <NUM_CPU_CORE> -script process_spinalcord.sh
To test the pipeline in one subject, run:
sct_run_batch -path-data <PATH_NEUROMOD_DATA> -path-output <PATH_OUTPUT> -script process_spinalcord.sh -include sub-01/ses-001
For more available options, run sct_run_batch -h