-
Notifications
You must be signed in to change notification settings - Fork 46
options
The workflow can be customized through the configuration file vpipe.config
. This configuration file is a text file written using a basic struture composed of sections, properties and values. For instance, we suggest to provide as input a tabular file specifying sample unique identifiers (e.g., patient identifiers), and dates for different sequencing runs related to the same patient. The name of this file (here, samples.tsv
) can be provided by specifying the section as input
and the property as samples_file
, as follows,
[input]
samples_file = samples.tsv
As shown above, sections are expected in squared brackets, and properties are followed by corresponding values.
Below, we provide a comprehensive list of all user-configurable options stratified by sections.
Directory where samples are stored. By default, it is set to samples
.
File containg sample unique identifiers and dates as tab-separated values, e.g.,
patient1 20100113
patient1 20110202
patient2 20081130
Here, we have two samples from patient 1 and one sample from patient 2.
By default, V-pipe searches for a file named samples.tsv
, if this file does not exist, a list of samples is built by globbing datadir
directory contents.
Optionally, the samples file can contain a third column specifying the read length. This is particularly useful when samples are sequenced using protocols with different read lengths. In this case, option trim_cutoff
should correspond to the a fraction between 0 and 1 (see below)
Fastq files are expected to be stored on a subdirectory named raw_data
. For example, for patient 1 and the first sample, the hierarchy should look like
samples
└── patient1
└── 20100113
└──raw_data
├──patient1_20100113_R1.fastq
└──patient1_20100113_R2.fastq
By default, V-pipe finds the fastq file matching the following pattern: prefix + R + {1,2} + .fastq
. If a suffix should be introducing after R1 and R2, user needs to specify it thorugh this option.
There are two options for setting this parameter. First, user can specify the minimum length to be used as filtering threshold after quality trimming (trim_cutoff
> 1). Second, user can specify a threshold relative to the read length before quality trimming (0 < trim_cutoff
< 1). This is particularly useful when samples are sequenced using protocols with different read lengths. In the latter, the samples_file
should be provided as the read length is parsed from this file.
Allocation of resources can variate with different input sizes (e.g. number of reads) and number of samples. Therefore, users can specify memory and time requirements for all rules. For multi-threaded software packages, threads can be also customized.
Available configurable options: mem
and time
.
Available configurable options: mem
and time
.
Mean quality score used for filtering low-quality reads.
Reads shorter than min_len are filtered out.
Available configurable options: mem
and time
.
Available configurable options: mem
and time
.
This rule takes all previously aligned reads by hmm_align
. Therefore, resources should be allocated accordingly.
Available configurable options: mem
and time
.
Defaults for user configurable options are provided in vpipe.snake
.