-
Notifications
You must be signed in to change notification settings - Fork 1
Lab 06: Calling Containers with Nextflow
Now that we've covered the basics of channels, processes, workflows, and choosing containers, it's time to put what we've learned into a framework that we can actually apply to common sequence analysis pipelines.
For this pipeline, we'll be using a practice RNA-Seq dataset from the Griffith Lab. This dataset is ideal for practice because it consists of 6 human RNA samples subset down to the 22nd chromosome only.
Let's head over to exercises/06_nextflow_containers/pipeline
to view a Nextflow pipeline setup for use of singularity containers.
In our main.nf
file, we see a new operator, fromFilePairs
.
ch_fastqs = Channel.fromFilePairs("${params.fastq_seqs}/*read{1,2}.fastq.gz", checkIfExists: true, flat:true)
Previously, we used the .fromPath()
operator to load our fasta sequence files into a queue channel. Nextflow also includes operators designed specifically for paired-end fastq sequence files using the fromFilePairs operator.
The fromFilePairs operator automatically uses glob patterns to detect which files are forward and reverse, and the output channel consists of tuples with the following information:
[sample_a, [/my/data/sample_a_1.fastq, /my/data/sample_a_2.fastq]]
[sample_b, [/my/data/sample_b_1.fastq, /my/data/sample_b_2.fastq]]
[sample_c, [/my/data/sample_c_1.fastq, /my/data/sample_c_2.fastq]]
We can see that the first element is the sample base name (stripped of the R1/R2 and fastq suffix) and the second element is a nested tuple with the full path to the R1 and R2 reads in order. Very useful!
If we inspect the process where this ch_fastqs
channel is passed in modules/fastqc.nf
, we see this format is being types as a tuple
, and each of its components is also typed as a val
or a path
.
input:
tuple val(id), path(r1), path(r2)
While we're in modules/fastqc.nf
, let's inspect the new process directive container
, directly above the input portion of the FASTQC process.
container = "quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0"
This specifies that the script portion of the process will be called inside a container, specifically the quay.io biocontainers image specified. Our previous process examples have relied only upon commands and tools available in the working shell environment, which is where Nextflow defaults to if no container or package is specified.
⭐ How do we know which container software is being used?
⭐ Where is the container image being stored?
Let's look at the nextflow.config
file for this pipeline.
Midway down the file, we see new settings defining a singularity scope.
singularity {
enabled = true
cacheDir = "${HOME}/singularity/"
autoMounts = true
}
In this case, singularity is enabled to call our process containers, the cache directory is set to save image data to $HOME/singluarity
, and automounts
are enabled.