-
Notifications
You must be signed in to change notification settings - Fork 27
Young edited this page Sep 25, 2023
·
1 revision
The quality of a sequencing run is very important. As such, many values are recorded so that the End User can assess the quality of the results produced from a sequencing run.
-
fastqc_raw_reads_1
andfastqc_raw_reads_2
are the number of reads prior to cleaning by either seqyclean or fastp. -
seqyclean_Perc_Kept
(params.cleaner = 'seqyclean') orfastp_pct_surviving
(params.cleaner = 'fastp') indicate how many reads remain after removal of low-quality reads (more = better). -
num_N
is the number of uncalled bases in the generated consensus sequence (less = better). -
num_total
is the total number of called bases in the generated consensus sequequence (more = better). As many consensus sequences are generated with this workflow via amplicon sequencing, the intitial and end of the reference often has little coverage. This means that the number of bases in the consensus sequence is less than the length of the reference sequence. -
num_pos_${params.minimum_depth}X
(which isnum_pos_100X
by default) is the number of positions for which there is sufficient depth to call variants (more = better). Any sequence below this value will be anN
. -
aci_num_failed_amplicons
uses the amplicon file to give a rough estimate as to which primer pairs are not getting enough coverage (less = better). -
samtools_num_failed_amplicons
uses the primer file to detect primer pairs and estimates coverages based on this (less = better).
More imformation on evaluating amplicon/primer failure can be found in the FAQ under the question 'Is there a way to determine if certain amplicons are failing?'
Kraken2 is optional for this workflow, but can provide additional quality assessment metrics:
-
top_organism
is the most common organism identified in the reads. -
percent_reads_top_organism
is the percentage of reads assigned that organism (more = better). -
%_human_reads
is the percentage of human reads reads (less = better).