Skip to content

v2.3.0

Compare
Choose a tag to compare
@sage-wright sage-wright released this 30 Dec 20:06
· 13 commits to main since this release
e492dec

PHVG v2.3.0 Release Notes

This minor release introduces updates organism updates for the TheiaCoV workflow series as well as a new workflow for preparing and submitting metadata to public repositories (Mercury_Prep_N_Batch).

Updates to the TheiaCoV Workflow Series

Organism track updates:

  • “MPXV” for monkeypox analysis: VADR annotation assessment enabled (was previously not supported)
  • "WNV" for West Nile Virus analysis: VADR annotation assessment enabled (was previously not supported)
  • "flu" for influenza analysis: will initiate genome assembly with IRMA and characterization with ABRicate against InsaFlu database and NextClade; available in TheiaCoV_Illumina_PE only
  • "HIV" for Human Immunodeficiency Virus analysis: will initiate consensus assembly by alignment (BWA + iVar or minimap2 + Medaka for Illumina and ONT read data, respectively) and characterization with Quasitools HyDRA for antiretroviral drug resistance detection

Note: The default value for the organism variable is “sars-cov-2”

QC and read processing modules updates:

Mercury Prep-N-Batch Workflow

The Mercury_Prep_N_Batch workflow combines the previously separate Mercury_PE/SE_Prep and Mercury_Batch workflows into one.
This workflow functions as follows:

Step 1: Performs supermassive metadata wrangling (task sm_metadata_wrangling in task_mercury_file_wrangling)

  • downloads the entire origin Terra table where the data, analysis results, metadata, etc. are stored.
  • extracts the samples that the user intends to upload
  • creates some standard variables that are used multiple times (such as year, isolate, etc.)
  • determines which organism is being run (currently only supports sars-cov-2 and mpox) and sets the required and optional variables for each file that is being created (e.g., BioSample vs SRA vs GISAID vs GenBank/BankIt)
  • removes any entries that do not meet predetermined quality thresholds (vadr_num_alerts and number_N)
  • removes any entries that do not have all required fields present, and writes the samples that were removed to a table that also lists what fields were missing
  • renames columns as appropriate
  • reformats columns as appropriate
  • compiles all required and optional information in TSV files
  • renames files with the submission_id and edits fasta headers as appropriate
  • uploads read files to the Theiagen SRA GCP Google bucket

Step 2: If sars-cov-2, trim GenBank fasta files of terminal Ns (task trim_genbank_fastas in task_mercury_file_wrangling.wdl)

  • uses VADR to trim terminal ambiguous nucleotides
  • returns the edited fasta file

Step 3: If mpox, put metadata into sqn format (task table2asn in task_mercury_file_wrangling.wdl)

  • soft links the .sbt, .fsa, and .src files to have common name
  • converts the data into a sqn file with table2asn so it can be emailed to NCBI

New Documentation

Detailed documentation has been created for all workflows in the PHVG v2.3.0 repository.

What's Changed

New Contributors

Full Changelog: v2.2.0...v2.3.0

Follow Theiagen on Twitter!