Skip to content
Young edited this page Sep 25, 2023 · 1 revision

Quality Assessment

The quality of a sequencing run is very important. As such, many values are recorded so that the End User can assess the quality of the results produced from a sequencing run.

  • fastqc_raw_reads_1 and fastqc_raw_reads_2 are the number of reads prior to cleaning by either seqyclean or fastp.
  • seqyclean_Perc_Kept (params.cleaner = 'seqyclean') or fastp_pct_surviving (params.cleaner = 'fastp') indicate how many reads remain after removal of low-quality reads (more = better).
  • num_N is the number of uncalled bases in the generated consensus sequence (less = better).
  • num_total is the total number of called bases in the generated consensus sequequence (more = better). As many consensus sequences are generated with this workflow via amplicon sequencing, the intitial and end of the reference often has little coverage. This means that the number of bases in the consensus sequence is less than the length of the reference sequence.
  • num_pos_${params.minimum_depth}X (which is num_pos_100X by default) is the number of positions for which there is sufficient depth to call variants (more = better). Any sequence below this value will be an N.
  • aci_num_failed_amplicons uses the amplicon file to give a rough estimate as to which primer pairs are not getting enough coverage (less = better).
  • samtools_num_failed_amplicons uses the primer file to detect primer pairs and estimates coverages based on this (less = better).

More imformation on evaluating amplicon/primer failure can be found in the FAQ under the question 'Is there a way to determine if certain amplicons are failing?'

Kraken2 is optional for this workflow, but can provide additional quality assessment metrics:

  • top_organism is the most common organism identified in the reads.
  • percent_reads_top_organism is the percentage of reads assigned that organism (more = better).
  • %_human_reads is the percentage of human reads reads (less = better).