options

V-pipe: user configurable options

The workflow can be customized through the configuration file vpipe.config. This configuration file is a text file written using a basic struture composed of sections, properties and values. For instance, we suggest to provide as input a tabular file specifying sample unique identifiers (e.g., patient identifiers), and dates for different sequencing runs related to the same patient. The name of this file (here, samples.tsv) can be provided by specifying the section as input and the property as samples_file, as follows,

[input]
samples_file = samples.tsv

As shown above, sections are expected in squared brackets, and properties are followed by corresponding values.

Below, we provide a comprehensive list of all user-configurable options stratified by sections.

input

datadir

Directory where samples are stored. By default, it is set to samples.

samples_file

File containg sample unique identifiers and dates as tab-separated values, e.g.,

patient1    20100113
patient1    20110202
patient2    20081130

Here, we have two samples from patient 1 and one sample from patient 2. By default, V-pipe searches for a file named samples.tsv, if this file does not exist, a list of samples is built by globbing datadir directory contents.

Optionally, the samples file can contain a third column specifying the read length. This is particularly useful when samples are sequenced using protocols with different read lengths. In this case, option trim_cutoff should correspond to the a fraction between 0 and 1 (see below)

fastq_suffix

Fastq files are expected to be stored on a subdirectory named raw_data. For example, for patient 1 and the first sample, the hierarchy should look like

samples
└── patient1
    └── 20100113
        └──raw_data
           ├──patient1_20100113_R1.fastq
           └──patient1_20100113_R2.fastq

By default, V-pipe finds the fastq file matching the following pattern: prefix + R + {1,2} + .fastq. If a suffix should be introducing after R1 and R2, user needs to specify it thorugh this option.

trim_cutoff

There are two options for setting this parameter. First, user can specify the minimum length to be used as filtering threshold after quality trimming (trim_cutoff > 1). Second, user can specify a threshold relative to the read length before quality trimming (0 < trim_cutoff < 1). This is particularly useful when samples are sequenced using protocols with different read lengths. In the latter, the samples_file should be provided as the read length is parsed from this file.

Allocation of resources can variate with different input sizes (e.g. number of reads) and number of samples. Therefore, users can specify memory and time requirements for all rules. For multi-threaded software packages, threads can be also customized.

gunzip

Available configurable options: mem and time.

extract

Available configurable options: mem and time.

preprocessing

mem

time

qual_threshold

Mean quality score used for filtering low-quality reads.

min_len

Reads shorter than min_len are filtered out.

initial_vicuna

mem

time

threads

initial_vicuna_msa

mem

time

threads

hmm_align

mem

time

threads

sam2bam

Available configurable options: mem and time.

bwa_align

mem

time

threads

coverage_QA

Available configurable options: mem and time.

msa

This rule takes all previously aligned reads by hmm_align. Therefore, resources should be allocated accordingly.

mem

time

threads

convert_to_hxb2

Available configurable options: mem and time.

Defaults for user configurable options are provided in vpipe.snake.

options

V-pipe: user configurable options

input

datadir

samples_file

fastq_suffix

trim_cutoff

gunzip

extract

preprocessing

mem

time

qual_threshold

min_len

initial_vicuna

mem

time

threads

initial_vicuna_msa

mem

time

threads

hmm_align

mem

time

threads

sam2bam

bwa_align

mem

time

threads

coverage_QA

msa

mem

time

threads

convert_to_hxb2

Clone this wiki locally