19 Dec 20:43

sage-wright

f81fdb1

v2.3.0 Latest

Latest

Public Health Bioinformatics v2.3.0 Minor Release

This minor release adds two new workflows, Fetch_SRR_Accession_PHB and Concatenate_Illumina_Lanes_PHB, and makes significant improvements to the TheiaCoV, TheiaEuk, TheiaProk, and TheiaMeta workflow series. Documentation updates and various bug fixes have also been implemented.

Full release notes can be found here!

Find our documentation here!

🆕 New workflows

Concatenate_Illumina_Lanes_PHB
- Some Illumina sequencing platforms produce FASTQ files split across multiple lanes for a single sample. This workflow combines multi-lane FASTQ files from Illumina sequencing runs into a single read1 and read2 file per sample. This workflow is ideal for Illumina sequencing outputs where data from multiple lanes must be combined to proceed with analysis workflows such as assembly or variant calling as it ensures that downstream workflows receive consolidated FASTQ files
- This workflow is designed to run automatically at the start of the TheiaProk workflow if multi-lane FASTQ files are provided (e.g., read1_lane2.fastq.gz and read2_lane2.fastq.gz)
- Import this workflow from Dockstore
Fetch_SRR_Accession_PHB
- This workflow will retrieve any Sequence Read Archive (SRA) accessions (SRR) associated with a given sample accession, such as a BioSample ID (e.g., "SAMN00000000") or SRA Experiment ID (e.g., "SRX000000").
  - This process utilizes the fastq-dl tool to fetch metadata from SRA and outputs the corresponding SRR accession(s).
  - If multiple SRR accessions are linked to a single sample, the workflow will output them as a comma-separated list.
- This workflow is particularly useful for retrieving SRR accessions a few days after running Terra_2_NCBI workflows.
- Import this workflow from Dockstore

🚀 Changes to existing workflows

All Genomic Characterization Workflows
- The read screen is now compatible with Dorado-produced FASTQ files
All Illumina Workflows
- fastq_scan has been updated to the latest version
All TheiaCoV Workflows
- The percentage of mapped reads is now output in all TheiaCoV workflows (except TheiaCoV_FASTA)
- The default Nextclade dataset tags have been updated for SC2, mpox, flu, RSV-A, and RSV-B
- The default Pangolin docker is now us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.31
- Kraken2 standalone is now used and databases must be provided.
TheiaCoV_Illumina_PE and TheiaCoV_ONT
- Default parameters have been set for H5N1 flu
- IRMA assembled flu segments now in sorted order
All TheiaEuk Workflows
- Additional genes for Candida auris are now examined by default in the Snippy_Gene_Query task
- Bug fix to the snippy_variants_num_variants output column for Cryptococcus neoformans
TheiaMeta_Illumina_PE
- MIDAS is now an optional task in TheiaMeta.
All TheiaProk Workflows
- stxtyper was added to all TheiaProk workflows
TheiaProk_Illumina_PE and TheiaProk_Illumina_SE
- Multi-lane Illumina data can now be used as input natively.
TheiaProk_Illumina_PE and TheiaProk_ONT
- TBProfiler has been updated to v6.4.1
- tbp-parser has been updated to v2.2.2
Augur_PHB
- Versioning information for the tree-building tools is now available
All Freyja Workflows
- Freyja now supports non-SARS-CoV-2 organisms natively.
Mercury_Prep_N_Batch
- Errors no longer occur when data has been previously transferred
- The correct information is now being provided for GISAID’s covv_coverage column for ClearLabs data
- Failures now fail the task
Snippy Workflows
- A new file with QC metrics has been created
- Additional QC metrics are now output
Terra_2_NCBI_PHB
- Collection dates will no longer have decimals

📚 Documentation Updates

Search tables better with table-specific search bars
Dead links removed
Generally improved documentation

What's Changed

[Documentation] Updated Snippy variants output documentation by @fraser-combe in #623
[TheiaCoV] iVar Consensus Pipefail fix by @Michal-Babins in #629
[TheiaProk] expose sistr optional param inputs to theiaProk wfs by @fraser-combe in #603
[Documentation] fix broken links by @sage-wright in #627
Snippy_Variants: Calculate % reads aligned by @fraser-combe in #616
[Augur +TheiaCoV] Enable H5N1 flu subtype augur & nextclade by @Michal-Babins in #640
[TheiaMeta] Midas call in read_QC_trim_pe.wdl workflow and outputs by @fraser-combe in #619
[TheiaCoV] Reorder flu segments from largest to smallest in irma task by @Michal-Babins in #635
[Mercury] prevent silent failures by @sage-wright in #648
Fixed theiacov documentation to specify assembly order by @Michal-Babins in #652
[TheiaCov & TheiaProk & TheiaEuk] read screen ONT bugfix and improvements by @kapsakcj in #650
[TheiaCoV ONT and Clearlabs] Update consensus task container to artic:1.2.4-1.12.0 by @cimendes in #636
[Documentation] Search bar for tables within docs by @fraser-combe in #646
[TheiaEuk] Additional genes for Snippy_Gene_Query by @sage-wright in #647
[MerlinMagic] Fixed output for crypto snippy_variants_num_variants by @Michal-Babins in #654
[Documentation] type error correction theiacov wf by @fraser-combe in #660
[TheiaProk] Adds stxtyper to merlin_magic and TheiaProk wfs by @kapsakcj in #525
[Mercury] bump mercury docker to 1.0.9: bugfix for GISAID metadata covv_coverage column by @kapsakcj in #661
[TheiaCov] wfs add percentage_mapped_reads by @fraser-combe in #641
[Documentation] Update MIDAS database documentation in TheiaProk by @fraser-combe in #667
Add Snippy_Variants QC outputs to Snippy_Tree and Snippy_Sreamline workflow outputs by @jrotieno in #592
[TheiaCoV/TheiaProk/TheiaMeta/TheiaEuk/Freyja_FASTQ] fastq-scan updates & improvements. Adding JSON as wf output file by @kapsakcj in #662
Prevent Silent Errors by @sage-wright in #666
[Augur] Add augur tree iqtree model type to output by @Michal-Babins in #674
[Terra2NCBI] Force collection_date to be a string by @cimendes in #658
[Documentation] Update code contribution guidelines by @fraser-combe in #675
[Retrieve_SRR_Metadata] New wf to retrieve SRR after Terra2NCBI wf by @fraser-combe in #668
Documentation Update by @frankambrosio3 in #678
[Documentation] Various updates by @sage-wright in #680
[TheiaCoV] Update nextclade dataset tags and pangolin docker version by @Michal-Babins in #679
[Documentation] update dataset tags by @Michal-Babins in #681
[TheiaCoV] Split database from Kraken2_TheiaCoV task by @cimendes in #670
[TheiaCoV] Update nextclade dataset tag for H5N1 to the latest version by @Michal-Babins in #683
[Freyja] Update freyja to version 1.5.2, expose pathogen flag and minor update to docs by @cimendes in #684
[Augur] Expose Augur versions by @Michal-Babins in #686
[TheiaProk] Update default versions for TB-Profiler and tbp-parser by @sage-wright in #673
v2.3.0 final changes by @sage-wright in #693
[Concatenate_Illumina_Lanes] Fix bug when single-end only by @sage-wright in https://github.com/theiagen/public_heal...

Contributors

kapsakcj, cimendes, and 5 other contributors

Assets 2

17 Sep 15:12

sage-wright

v2.2.1

9a10de7

v2.2.1

Public Health Bioinformatics v2.2.1 Patch Release Notes

🩹 This patch release fixes the output names for the NCBI-Scrub standalone workflows.

Our documentation has also been migrated to GitHub for easier maintenance.

Full release notes can be found here!
Find our documentation here!

What's Changed

[Documentation] Transfer all PHB documentation to GitHub by @sage-wright in #605
[NCBI Scrub Standalone Workflows] Correct output declarations for the number of spots removed by @cimendes in #610
[v2.2.1] update version tag by @sage-wright in #622

Full Changelog: v2.2.0...v2.2.1

Contributors

cimendes and sage-wright

Assets 2

03 Sep 13:22

sage-wright

v2.2.0

5be3433

v2.2.0

Public Health Bioinformatics v2.2.0 Minor Release Notes

This minor release adds two new workflows, Create_Terra_Table_PHB and Snippy_Streamline_FASTA_PHB, and makes significant improvements to the TheiaProk, TheiaCoV, TheiaMeta, and Freyja workflow series. Additionally, several bug fixes have been made.

Full release notes can be found here!

Find our documentation here!

🆕 New workflows:

Create_Terra_Table_PHB
- The manual creation of Terra tables can be tedious and error-prone. This workflow will automatically create your Terra data table when provided with the location of the files. It can import assembly, paired-end (Illumina) and single-end (Illumina and Oxford Nanopore) data.
- Import the workflow from Dockstore.
Snippy_Streamline_FASTA_PHB
- Since Snippy_Variants_PHB is now compatible with assembled sequences as input in FASTA format, we have developed Snippy_Streamline_FASTA, an all-in-one approach to generating a reference-based phylogeny using the Snippy tools, mirroring the Snippy_Streamline_PHB workflow. By default, it runs Snippy_Variants and Snippy_Tree, but will optionally run Assembly_Fetch if a reference genome is not provided.
- Import the workflow from Dockstore.

🚀 Changes to existing workflows:

All TheiaProk Workflows
- Genomic characterization with emmtyper is now enabled for Streptococcus pyogenes. (Thanks, @sam-baird!)
- When call_ani is true, failures will no longer occur if multiple hits have the same score.
- Support for Vibrio parahaemolyticus, Vibrio vulnificus and Enterobacter asburiae was added to the AMRFinderPlus task
- VirulenceFinder now runs on Shigella sonnei samples.
- The Docker containers for AMRFinderPlus, tbp-parser and mlst have been updated:
  - AMRFinderPlus: 3.12.8-2024-07-22.1
  - tbp-parser: tbp-parser:1.6.0
  - mlst: 2.23.0-2024-08-01
- Genomic characterization can now be skipped by setting the new optional input perform_characterization to false.
- The GAMBIT prokaryotic database has been updated to v2.0.0-20240628.
- Optional inputs are now available for all tasks within the merlin_magic subworkflow.
All TheiaCoV Workflows
- GenoFLU has been added for H5N1 influenza typing.
- Additional VADR output files have been exposed:
  - File? vadr_feature_tbl_pass
  - File? vadr_feature_tbl_fail
  - File? vadr_classification_summary_file
  - File? vadr_all_outputs_tar_gz
- Aligned FASTQs no longer contain supplemental/secondary alignments.
TheiaCoV_Illumina_PE_PHB and TheiaCoV_ONT_PHB
- Workflow will no longer fail if an assembly cannot be produced. The assembly_fasta column will say "Assembly could not be generated".
TheiaEuk_Illumina_PE_PHB
- TheiaEuk no longer abruptly fails if an organism outside of the expected list of taxa is detected by GAMBIT.
- All optional inputs and docker containers for taxa-specific sub-modules have been exposed.
All ONT workflows (TheiaProk and TheiaCoV)
- KMC is no longer used for genome-size prediction. Instead, for TheiaProk, the expected genome length is now set to 5 Mb, which is around 0.7 Mb larger than the average bacterial genome length. For TheiaCoV, species have default genome lengths associated with their organism tag.
TheiaCoV and TheiaMeta workflows
- The human read removal tool (HRRT) has been updated to v2.2.1. For paired-end data, reads are first interleaved to guarantee that no mates are orphaned by this tool.
All Freyja Workflows
- Freyja has been updated for all workflows to version 1.5.1.
- SARS-CoV-2 UShER barcodes file is now a .feather file.
- Freyja_FASTQ_PHB is now compatible with Illumina paired-end, Illumina single-end and Oxford Nanopore data. A new input ont has been added to control workflow behavior.
- The UShER barcodes and lineage files used are now exposed as outputs in Freyja_FASTQ_PHB
Snippy_Variants_PHB
- In addition to reads, paired-end, and single-end, assemblies are now accepted as input. If Illumina sequencing data is to be used, use the read1 and optionally, the read2, optional inputs to pass the forward and reverse-facing reads respectively, If assembled genomes are to be used, use the assembly_fasta input and omit read1 and read2.
SRA_Fetch_PHB
- SRA-Lite files are now detected when it's a low-quality file.
Augur_PHB
- mpox mutation context has been added to the auspice_input_json output which displays the fraction of G->A or C->T.
GAMBIT_Query_PHB
- The GAMBIT prokaryotic database has been updated to v2.0.0-20240628.
Mercury_Prep_N_Batch_PHB
- Mercury has been moved to its own repository at https://github.com/theiagen/mercury.
- Mercury now processes BioSample & SRA metadata for flu

What's Changed

[TheiaProk] Add emmtyper task for Streptococcus pyogenes by @sam-baird in #524
[SRA-Fetch] Detect SRA-Lite when it's low quality file by @cimendes in #512
Adding the Create_Terra_Table_PHB workflow by @sage-wright in #533
[Create_Terra_Table] recognize fastq files that end in .fq by @sage-wright in #535
[TheiaProk - ANI] prevent failures when multiple top hits have the same score by @sage-wright in #532
[TheiaCoV] Flu: Prevent workflow failures when assembly cannot be produced; generate NanoPlot outputs regardless of assembly success by @sage-wright in #530
[theiaprok] amrfinderplus: add support for Vibrio parahaemolyticus, Vibrio vulnificus, Enterobacter asburiae. Fix C diff bug by @kapsakcj in #542
[TheiaCoV] Add GenoFLU for flu whole-genome genotyping by @sage-wright in #540
[TheiaProk] Merlin_magic subwf bugfix: run virulencefinder on Shigella sonnei by @kapsakcj in #543
[TheiaCoV and TheiaMeta] Update hrrt (ncbi-scrub) to version 2.2.1 and optimise task by @cimendes in #527
[TheiaCoV and TheiaMeta - HRRT] Patch bug by removing unneeded awk verification by @cimendes in #550
Create CODEOWNERS by @AndrewLangvt in #554
[TheiaProk] Add additional input enabling characterization by @sage-wright in #547
Updating templates & broken links in the readme by @sage-wright in #555
[TheiaEuk] Fix bug where String outputs were being passed as File for Snippy_variants by @cimendes in #574
[TheiaProk] update tbp-parser to latest version by @sage-wright in #576
[Create_Terra_Table] fix bug, and enable ability for users to provide their own file ending suffixes by @sage-wright in #575
[theiacov] Add additional vadr output files & tarball; upgrade VADR docker by @kapsakcj in #556
[ONT] Remove KMC by @sage-wright in #578
[Create_Terra_Table] fix sample name i...

Contributors

kapsakcj, AndrewLangvt, and 4 other contributors

Assets 2

26 Jun 14:14

cimendes

v2.1.0

d0377e1

v2.1.0

Public Health Bioinformatics v2.1.0 Minor Release Notes

This minor release improves the utility and usability of several Oxford Nanopore Technologies’ dedicated workflows for viral and bacterial genomic characterization (TheiaCoV and TheiaProk). Additionally, support for new organisms has been added to several workflows.

Full release notes can be found here!

Find our documentation here!

🚀 Changes to existing workflows:

All TheiaProk Workflows
- General Abricate is now available though the call_abricate and abricate_db optional inputs.
- Abricate specifically for Vibrio cholerae is now available. It launches automatically if the gambit_predicted_taxon or expected_taxon is Vibrio cholerae.
- A new optional parameter separate_betalactam_genes is now available that splits AMRFinderPlus beta-lactam hits into new columns.
- The call_midas optional input is now set to false by default.
TheiaProk_Illumina_PE
- New read quality-control outputs have been added: r1_mean_q_clean, r2_mean_q_clean, r1_mean_readlength_clean and r2_mean_readlength_clean.
TheiaProk_ONT
- New read quality-control outputs have been added: nanoplot_r1_median_readlength_raw, nanoplot_r1_stdev_readlength_raw, nanoplot_r1_n50_raw, nanoplot_r1_median_q_raw, nanoplot_r1_est_coverage_raw, nanoplot_r1_median_readlength_clean, nanoplot_r1_stdev_readlength_clean, nanoplot_r1_n50_clean, nanoplot_r1_median_q_clean and nanoplot_r1_est_coverage_clean.
- Kraken2 is now available through the call_kraken and kraken_db optional inputs.
- A maximum genome size of 10Mbp is set to prevent excessive runtimes.

All TheiaCoV Workflows

RSV-A and RSV-B are now able to be analyzed with the TheiaCoV workflows. Nextclade characterization and Kraken taxonomic analysis will now be run on RSV samples.

The following default organisms now have the following Nextclade dataset tags:

Organism	New default Nextclade dataset tag
SARS-CoV-2	"2024-06-13--23-42-47Z"
mpox	"2024-04-19--07-50-39Z"
Flu H1N1 HA	"2024-04-19--07-50-39Z"
Flu H1N1 NA	"2024-04-19--07-50-39Z"
Flu H3N2 HA	"2024-04-19--07-50-39Z"
Flu H3N2 NA	"2024-04-19--07-50-39Z"
Flu Victoria HA	"2024-04-19--07-50-39Z"
Flu Victoria NA	"2024-04-19--07-50-39Z"

TheiaProk_ONT
- New read quality-control outputs have been added: nanoplot_r1_median_readlength_raw, nanoplot_r1_stdev_readlength_raw, nanoplot_r1_n50_raw, nanoplot_r1_median_q_raw, nanoplot_r1_est_coverage_raw, nanoplot_r1_median_readlength_clean, nanoplot_r1_stdev_readlength_clean, nanoplot_r1_n50_clean, nanoplot_r1_median_q_clean and nanoplot_r1_est_coverage_clean.
TheiaCoV Flu Track
- All of the flu-specific tasks now live in their own sub-workflow, flu_track. This has no effect on the end-user.
- In TheiaCoV_ONT, flu samples will now have both the HA and NA segment’s assembly mean coverage appear in the assembly_mean_coverage output variable. This reflects the behaviour already present on TheiaCoV_Illumina_PE.
- The all-segments FASTA header lines now include samplename.
- The new output irma_subtype_notes now indicates if IRMA was able to determine the flu subtype
- All workflows now uses abricate_flu_subtype (instead of irma_subtype) for selecting the appropriate nextclade_dataset_tag.
- Nextclade outputs columns for flu now explicitly state either HA or NA.
- Padded assemblies, where - or . present in the final assembly file are either removed or replaced by N (respectively), are now being provided to MAFFT and VADR to prevent task failures.
Terra_2_NCBI
- Skipping BioSample submission via the skip_biosample optional now skips the requirement to have BioSample metadata in your data table.
Augur_Prep_PHB and Augur_PHB
- RSV-A and RSV-B can now be analyzed with the Augur workflows.
- Metadata no longer required to run Augur. Only a distance tree will be created if metadata is not provided.
kSNP3 and other phylogenetic inference workflows
- Outputs from phylogenetic workflows (SNP matrices) and the summarize_data task will now have a properly toggleable Phandango coloring suffix.
- The phandango_coloring optional input is now off by default.

Docker container updates:

IRMA has been updated to version v1.1.5
AMRFinderPlus has been updated to version v3.12.8-2024-05-02.2
ts_mlst database has been updated as of 2024-06-01
Pangolin database has been updated to pdata v1.27

🐛 Bug fixes and small improvements:

TheiaProk_ONT and TheiaProk_FASTA: Hicap was being run in TheiaProk_ONT but the outputs were never appearing in the data table! This has been fixed.
All TheiaCoV workflows: Unsupported organisms will no longer cause workflow failures.
Terra_2_NCBI: Fixed a typo when using the Wastewater Biosample package that was causing an error.
Freyja_Dashboard: The freyja_dasbhoard output variable now correctly says freyja_dashboard.
Workflows that accept String inputs that are used to name things: Several input variables such as cluster_name now accept Strings with whitespace.
All workflows: Runtime parameters have been adjusted for several tasks.
TheiaCoV Flu Track: A bug has been fixed for IRMA running out of disk space. Additionally, another bug affecting Flu B samples was fixed related to empty HA segment FASTA files.

What's Changed

TheiaCoV wf support for RSV - run nextclade by default and small optimizations (kraken_target_organism, genome_length) by @kapsakcj in #436
[New workflow - internal] Gambitcore for assembly quality assessment with GAMBIT by @cimendes in #466
[TheiaProk_ONT and TheiaCoV_ONT] Expose additional QC metrics from nanoplot for both raw and clean reads by @cimendes in #452
Exposing r1 and r2 mean_q_clean and mean_readlength_clean by @jrotieno in #455
[TheiaProk_ONT] add patch fix to kmc estimated genome size to not go over 10Mbp by @cimendes in #459
Add abricate as optional module by @jrotieno in #431
[TheiaProk_ONT] Add Kraken2 as part of read_qc by @cimendes in #438
[Flu] Assembly mean coverage & read screen clean-up by @sage-wright in #469
[Freyja_Dashboard] fix typo in freyja_dashboard output File variable name by @AndrewLangvt in #482
[Terra_2_NCBI] remove metadata requirements with skip_biosample == true by @sage-wright in #475
Augur Updates for RSV-A and RSV-B by @jrotieno in #478
[kSNP3] fix behaviour when phandango colouring is set to false by @cimendes in #496
[Internal] Updating runtime parameters by @sage-wright in #494
Automatically convert spaces to dashes in workflows that accept strings by @AndrewLangvt in #498
[TheiaCoV] Enable user to run TheiaCoV with an unsupported organism by @sage-wright in #501
[AMRFinderPlus] parse BETA-LACTAM genes and subclasses into individual output columns by @sage-wright in #505
IRMA bug fixes & improvements; theiacov_illumina_pe wf updates for Flu by @kapsakcj in #468
Augur_PHB: Set sample_metadata_tsvs input to optional by @jrotieno in #503
[Internal - Gambitcore] Downgrade database to stable 1.3.0 version by @cimendes in #473
[TheiaCoV_Illumina_PE & _ONT] Create sub-workflow for flu-specific modules by @sage-wright in #502
[TheiaProk] Add abricate module for vibrio characterization by @cimendes in #429
[TheiaProk] expose hicap outputs in theiaprok_fasta and theiaprok_ont by @cimendes in #508
Fix typo in Terra_2_NCBI Wastewater metadata by @michellescribner in #519
[TheiaProk] Update amrfinderplus to v3.12.8; DB: v2024-05-02.2; reduce compute resources by @kapsakcj in #514
[TheiaProk] upgrade mlst docker image to 2024-06-01 staphb build; reduced runtime parameters; enable preemptible by @kapsakcj in #516
update default...

Contributors

kapsakcj, AndrewLangvt, and 4 other contributors

Assets 2

01 May 21:54

frankambrosio3

v2.0.1

e6c97dc

v2.0.1

Public Health Bioinformatics v2.0.1 Patch Release Notes

🩹 This patch release updates the default midas_db location

Full release notes can be found here!
Find our documentation here!

What's Changed

update default midas_db location to requester pays bucket by @kapsakcj in #446
Update version to v2.0.1 by @sage-wright in #448

Full Changelog: v2.0.0...v2.0.1

Contributors

kapsakcj and sage-wright

Assets 2

22 Apr 18:15

sage-wright

v2.0.0

880a66c

v2.0.0

Public Health Bioinformatics v2.0.0 Release Notes

This major release simplifies the usage of the TheiaCoV workflows and does major restructuring on all inputs and outputs on several workflows, including TheiaCoV, TheiaProk, TheiaEuk, and TheiaMeta. Additionally, it introduces three new workflows, improves on several workflows, and resolves various bugs.

Full release notes can be found here.

All inputs and outputs have been standardized across all of PHB. More information can be found here.

Find our documentation here!

🆕 New workflows:

Kraken2_ONT_PHB
- You can now analyze ONT data through the Kraken2 software.
- Import the workflow from Dockstore
TBProfiler_tNGS_PHB
- This workflow is still in a beta state; development is currently ongoing.
- It is used to process targeted next-generation sequencing (tNGS) Mycobacterium tuberculosis data for antimicrobial resistance (AMR) characterization with TBProfiler and tbp-parser. It includes quality assessment and control with Trimmomatic.
- Import the workflow from Dockstore
Find_Shared_Variants_PHB
- Find_Shared_Variants_PHB is a workflow for concatenating the variant results produced by the Snippy_Variants_PHB workflow across multiple samples and reshaping the data to illustrate variants that are shared among multiple samples.
- Import this workflow from Dockstore

🚀 Changes to existing workflows:

TheiaCoV, TheiaProk, TheiaEuk and TheiaMeta workflows
- All inputs and outputs have been standardized across all workflow series

TheiaCoV Workflow Series

The workflow_parameters sub-workflow now controls all taxa-specific optional inputs in TheiaCov. The default value for the organism input is still set to "sars-cov-2".
VADR is now enabled for flu, rsv-a and rsv-b.
Nextclade has been updated to v3. Older dataset tags than the ones provided by default are not compatible with the current version. See below for the list of updated nextclade_dataset_tags.

Nextclade dataset names & their default values in TheiaCoV workflows have also changed. For example "sars-cov-2" is now called "nextstrain/sars-cov-2/wuhan-hu-1/orfs". The name "sars-cov-2" still works as an alias, but we recommend using the full name because it is more descriptive and clearer, and will be supported by Nextclade for the foreseeable future.

Organism	Old Dataset Name	New Dataset Name	New Dataset Tag
SARS-CoV-2	`"sars-cov-2"`	`"nextstrain/sars-cov-2/wuhan-hu-1/orfs"`	`2024-04-15--15-08-22Z`
Mpox (specifically, Mpox lineage B.1 dataset)	`"hMPXV_B1"`	`"nextstrain/mpox/lineage-b.1"`	`2024-01-16--20-31-02Z`
Influenza A H1N1 HA	`"flu_h1n1pdm_ha"`	`"nextstrain/flu/h1n1pdm/ha/MW626062"`	`2024-01-16--20-31-02Z`
Influenza A H3N2 HA	`"flu_h3n2_ha"`	`"nextstrain/flu/h3n2/ha/EPI1857216"`	`2024-02-22--16-12-03Z`
Influenza B Victoria HA	`"flu_vic_ha"`	`"nextstrain/flu/vic/ha/KX058884"`	`2024-01-16--20-31-02Z`
Influenza B Yamagata HA	`"flu_yam_ha"`	`"nextstrain/flu/yam/ha/JN993010"`	`2024-01-30--16-34-55Z`
Influenza A H1N1 NA	`"flu_h1n1pdm_na"`	`"nextstrain/flu/h1n1pdm/na/MW626056"`	`2024-01-16--20-31-02Z`
Influenza A H3N2 NA	`"flu_h3n2_na"`	`"nextstrain/flu/h3n2/na/EPI1857215"`	`2024-01-16--20-31-02Z`
Influenza B Victoria NA	`"flu_vic_na"`	`"nextstrain/flu/vic/na/CY073894"`	`2024-01-16--20-31-02Z`
RSV-A	`"rsv_a"`	`"nextstrain/rsv/a/EPI_ISL_412866"`	`2024-01-29--10-29-43Z`
RSV-B	`"rsv_b"`	`"nextstrain/rsv/b/EPI_ISL_1653999"`	`2024-01-29--10-29-43Z`

TheiaCoV Flu Track
- For the flu track:
  - Tamiflu-resistance determination has been removed in favor of the oseltamivir nomenclature. Additionally, amantadine and rimantadide were added.
    - We now check for antiviral resistance mutations against the following 10 antiviral drugs: A_315675, amantadine, compound_367, favipiravir_resistanceflu_fludase, L_742_001, laninamivir, peramivir, pimodivir, rimantadine, oseltamivir, xofluza, zanamivir.
  - For TheiaCoV_Illumina_PE, assembly coverage is now computed for both HA and NA segments
  - Nexclade outputs are now computed for the NA fragment as well as HA
TheiaProk Workflow Series
- Plasmidfinder can now be toggled off through the call_plasmidfinder optional input
- Trimmomatic encoding is now set to 33 by default to avoid failures when processing SRA-Lite formatted FASTQ files
TheiaMeta
- Automated binning has been integrated into TheiaMeta when a reference file is not provided. Binning is performed with SemiBin2
- The assembly module optional inputs have been exposed, allowing the user to control the behavior of metaSPAdes and Pilon
SRA_Fetch
- A new warning column has now been implemented indicating if the downloaded file is suspected to be in SRA-Lite format

Docker container updates:

Augur has been updated to commit hash cec4fa0ecd8612e4363d40662060a5a9c712d67e, from 2024-02-01
BUSCO has been updated to version v5.7.1. Due to memory issues when running eukaryotic assemblies, TheiaEuk was excluded from this update and still runs on version v5.3.2
pasty has been updated to version v1.3.0
tbp-parser has been updated to version v1.4.2
theiavalidate has been updated to version v0.1.0
ts_mlst database has been updated as of April 2024
VADR has been updated to version v1.6.3

🐛 Bug fixes and small improvements:

All workflows: Fastq_Scan outputs have been renamed (now prefixed with fastq_scan_*) to differentiate them from fastQC. Several outputs for FastP and fastQC are now exposed such as the respective report HTMLs.
TheiaCoV (all workflows): Edge-case bugs in QC_check and Pangolin have been resolved. The percent gene coverage task has been modularized.
TheiaCoV Illumina PE: read1_aligned, read1_unaligned, read2_aligned, read2_unaligned, sorted_bam_aligned, sorted_bam_aligned_bai, sorted_bam_unaligned, and sorted_bam_uanligned_bai are now outputted by the workflow.
TheiaProk (all workflows): midas_secondary_genus_coverage (the secondary genus absolute coverage) is now output.
TheiaEuk: Several outputs from the snippy_variants task have been exposed: snippy_variants_num_reads_aligned, snippy_variants_num_variants, snippy_variants_coverage_tsv, and snippy_variants_percent_ref_coverage.
BaseSpace_Fetch: A fix has been implemented that greatly speeds up the download of data from BaseSpace when using Basespace "Projects" to organize sequencing runs.
Snippy_Streamline: snippy_concatenated_variants and snippy_shared_variants are now exposed as Snippy_Streamline outputs. The snippy_snp_matix output has been deprecated in favor of snippy_wg_snp_matrix and snippy_cg_snp_matrix.
kSNP3: ksnp3_number_snps, ksnp3_number_core_snps and ksnp3_core_snp_table have been added to the collection of outputs.
Kraken2 Standalone (all workflows): Uncompressed read files can now be processed by all Kraken2 workflows
Freyja_FASTQ: A new optional input depth_cutoff has been added, giving the user the option to exclude sites with coverage depth below the provided value (by default no cutoff is performed). New outputs added: freyja_coverage and freyja_barcode_file

What's Changed

Adding assembly_mean_coverage metrics for flu in TheiaCoV_Illumina_PE_PHB by @jrotieno in #314
pangolin TMPDIR add and CI updates & improvements by @kapsakcj in #327
expose optional input parameter disk_size for kraken2 standalone wfs by @kapsakcj in #316
TheiaValidate: Compare file contents (#264) by @sage-wright in #335
Added Freyja coverage output to Terra table by @emmadoughty in #317
[TheiaMeta] Binning with SemiBin2 by @cimendes in https://github.com/theiagen/public_health_bioinformatics/...

Contributors

kapsakcj, cimendes, and 4 other contributors

Assets 2

17 Jan 18:03

cimendes

v1.3.0

c3f3b70

v1.3.0

Public Health Bioinformatics v1.3.0 Release Notes

This minor release introduces two new workflows, improves on several workflows, and resolves various bugs

Full release notes can be found here.

🆕 New workflows:

TheiaCoV_FASTA_Batch_PHB
- This workflow implements TheiaCoV_FASTA for many SARS-CoV-2 samples at once.
- This a set-level workflow that populates the results to a sample-level data table in Terra.bio
- Currently, this workflow only runs Pangolin4 and NextClade
- Import the workflow from Dockstore
Rename_FASTQ_PHB
- This workflow is a utility to quickly and easily rename a set of FASTQ files, either paired-end or single-end.
- Import the workflow from Dockstore

🚀 Changes to existing workflows:

TheiaCoV_ONT_PHB
- Influenza is now supported. Use "flu" for the organism optional input String parameter.
  - "sars-cov-2" and "HIV" tracks are unchanged.
TheiaProk Workflow Series
- If user-input (expected_taxon) or predicted taxon by Gambit belongs to the Shigella genus, the Extensively Drug-Resistant phenotype is predicted using the new resfinder pointfinder database.
- If user-input (expected_taxon) or predicted taxon by Gambit is the Mycobacterium tuberculosis species, bcftools indexes and merges all potential VCF files created by TbProfiler (both .bcf and .gz files).
- Kraken2 has been added as an optional module (except for TheiaProk_ONT_PHB). If call_kraken is true, a database must be provided through kraken_db.
- Two new optional inputs were added to control ANIm behaviour: ani_threshold (default 85.00) and percent_bases_aligned_threshold (default 70.00).
TheiaCoV_FASTA_PHB
- The list of allowed input organism now includes "sars-cov-2" (default), "rsv_a", "rsv_b", "WNV", "MPXV" and "flu".
TheiaCoV_Illumina_PE_PHB
- If organism is set as "flu", the workflow searches for antiviral mutations in the HA, NA, PA, PB1 and PB2 assembly segments, targeting the following 10 antivirals.: A_315675, compound_367, Favipiravir, Fludase, L_742_001, Laninamivir, Peramivir, Pimodivir, Xofluza and Zanamivir.
All Illumina SE and PE Workflows
- A new optional input, read_qc, to allow the user to decide between fastq_scan and fastqc for the evaluation of read quality. The affected workflows are: TheiaCoV_Illumina_PE_PHB, TheiaCoV_Illumina_SE_PHB, TheiaProk_Illumina_SE_PHB, TheiaProk_Illumina_PE_PHB, TheiaMeta_Illumina_PE_PHB and Freyja_FASTQ_PHB.
CZGenEpi_Prep_PHB
- Instead of extracting the sample_is_private_column_name and the gisaid_id_column_name columns, these columns are now generated by the program using already-provided inputs and by the new is_private Boolean variable which is used to set the value for all samples in the set. The field "GISAID ID (Public ID) - Optional" will now reflect the GISAID syntax for Virus Name.

Docker container updates:

AMRFinderPlus has been updated to version v3.11.20 and database 2023-09-26.1
tbp-parser has been updated to version 1.2.0
Freyja has been updated to version 1.4.8
ts_mlst database has been updated as of January 2024
Gambit has been updated to version 1.3.0, including its database files
Pangolin4 has been updated to version 4.3.1-pdata-1.23.1
IRMA has been updated to version 1.1.3

Tag updates:

SARS-CoV-2 Nexclade Dataset Tag has been updated to 2023-12-03T12:00:00Z

🐛 Bug fixes and small improvements:

kSNP3_PHB: The ksnp3_core_vcfoutput has been renamed to ksnp3_vcf_ref_genome for readability. Additionally, two new outputs are provided: ksnp3_vcf_snps_not_in_ref and ksnp3_vcf_ref_samplename.
TheiaProk Workflow Series: The MIDAS task was adjusted to reduce logging, and therefore the size of the log file, aiding debugging & reducing storage costs.
TheiaMeta_Illumina_PE_PHB: A new task Krona was added for the visualization of the Kraken2 reports.
Mercury_Prep_N_Batch: The excluded_samples.tsv is now printed to the execution log file, aiding debugging.
TheiaCoV Workflow Series: The nextclade_lineage output now populates correctly for SARS-CoV-2. Additionally, the nexclade_qc field is now exposed as an output.
Augur_PHB: The AUGUR refine input clock_filter_iqd has been reverted to the previous default value of 4.
Kraken Standalone Workflows: A new task Krona was added for the visualization of the Kraken2 reports.
TheiaValidate_PHB: TheiaValidate now outputs a table with validation-criteria failures only. Additionally, a new input was added that can translate different column names between tables to enable comparison.
TheiaCoV_ONT_PBH: If a sample fails quality check with read screening, this will no longer cause the workflow to fail. Instead, it will finish with an appropriate message.
Samples_To_Ref_Tree_PHB: The organism input has been renamed to nextclade_dataset_name for better clarity.
Various workflows: Call caching was disabled in the following workflows: BaseSpace_Fetch_PHB, Transfer_Column_Content_PHB, Assembly_Fetch_PHB, Snippy_Streamline_PHB and TheiaValidate_PHB.

What's Changed

updated VCF output file renaming in kSNP3 task by @kapsakcj in #207
reduce unnecessary logging in MIDAS task by @kapsakcj in #210
update default amrfinderplus docker image to v3.11.20 and db 2023-09-26.1 by @kapsakcj in #229
TheiaCoV_ONT_PHB Influenza Track by @jrotieno in #233
TheiaCoV_FASTA_Batch: TheiaCoV_FASTA, for many samples at once by @sage-wright in #238
Add krona task to TheiaMeta_Illumina_PE by @cimendes in #213
added 2 QC thresholds to ANI task to reduce false positives by @kapsakcj in #168
Resfinder improvements, added support for Shigella spp., added XDR Shigella prediction by @kapsakcj in #159
disable call caching for various workflows by @kapsakcj in #251
Mercury_Prep_N_Batch: print the excluded_samples.tsv and update Docker to avoid Google SDK warning by @sage-wright in #220
Nextclade Output Added by @DOH-HNH0303 in #239
TheiaCoV_FASTA: Adding five new organisms by @jrotieno in #194
Update task_augur_refine iqd back to 4 by @jrotieno in #268
TheiaCoV Illumina PE: Identify Influenza Antiviral Resistance Mutations in Assemblies by @jrotieno in #252
[New Utility] Workflow to rename FASTQ files (non-destructive) by @cimendes in #267
[TheiaCoV_Fasta_Batch] Substitute FASTA concatenating task to ensure proper sample_id propagation by @cimendes in #274
Kraken2 Standalone: add krona visualisation by @cimendes in #225
TheiaValidate_PHB: new features and new Docker image from TheiaValidate repository by @sage-wright in #255
TheiaProk TB: new VCF output and modification to the coverage report by @sage-wright in #245
TheiaCoV_ONT: prevent failure by coercing files into strings by @sage-wright in #288
update default freyja docker image to 1.4.8 for multiple tasks by @kapsakcj in #289
FastQC added as an optional module in all Illumina_PE and Illumina_SE workflows by @sage-wright in #260
update docker to version tag 2.23.0-2024-01 by @cimendes in #293
[TheiaProk Workflows] Add Kraken2 as optional module by @cimendes in #286
CZG...

Contributors

kapsakcj, cimendes, and 5 other contributors

Assets 2

23 Oct 20:19

frankambrosio3

v1.2.1

ab54419

v1.2.1

Public Health Bioinformatics v1.2.1 Release Notes

This patch release resolves various bugs and updates workflow defaults.

🐛 Bug Fixes

🦑 Kraken2_PE

A bug was fixed in the Kraken2_PE_PHB standalone workflow where the workflow was expecting required outputs from the Kraken2_standalone task that are now optional. This solves the issue encountered when trying to import the workflow which would be unsuccessful.

Impacted Workflows/Tasks:

Kraken2_PE_PHB

The following workflows uses Kraken2_standalone task but have not been affected as they do not require the affected outputs:

TheiaMeta_Illumina_PE_PHB
Kraken2_SE_PHB

The following workflows use a different Kraken2 task and have not been affected:

TheiaCoV_Illumina_PE_PHB
TheiaCoV_Illumina_SE_PHB

🌲 Augur

The requirement to present genes and colors input files was causing run failures for non-MPXV tree builds. These files are no long required.

Users reported issues with with optional Augur_PHB inputs, specifically colors_tsv, with the following error messages:

Error_1:"Failed to evaluate 'colors_tsv' (reason 1 of 1): Evaluating select_first([colors, mpxv_defaults.colors]) failed: select_first was called with 2 empty values. We needed at least one to be filled."
Error_2: "Failed to evaluate 'genes' (reason 1 of 1): Evaluating select_first([genes, mpxv_defaults.genes]) failed: select_first was called with 2 empty values. We needed at least one to be filled."

📚 Read Screen

The read screen task is designed to assess the quantity and quality of reads used as the input to the workflow, and halt the workflow if it is determined that the reads are insufficient. One of the qualities of the reads that is checked is the proportion of reads found in the R1 and R2 files.
- The former implementation did not calculate the proportion of reads correctly, and the reported error message did not reflect the defined parameter correctly.
- The math has been updated such that the ratio can not be unbalanced beyond a 60/40 split.

🔧 Workflows Updates

Workflows

🔬 TheiaCoV Workflows

The default nextclade_dataset_tag for SARS-CoV-2 was updated to "2023-09-21T12:00:00Z" (as of 2023-10-10) across all 5 TheiaCov workflows:
- TheiaCoV_Illumina_PE_PHB, TheiaCoV_Illumina_SE_PHB, TheiaCoV_ClearLabs_PHB, TheiaCoV_ONT_PHB, TheiaCoV_FASTA_PHB

🦠 TheiaProk Workflows

KmerFinder was added to the TheiaProk suite of workflows to find the best match (species identification) of a fasta file in a (kmer) database (downloaded on 2023-09-11).

New Outputs

kmerfinder_docker
kmerfinder_results_tsv
kmerfinder_top_hit
kmerfinder_query_coverage
kmerfinder_template_coverage
kmerfinder_database

Task Files

🎙️ UShER

The runtime environment for the UShER task has been allocated additional compute resources to allow for larger input sets.
The following defaults for the Pilon task were changed:
- CPU 4 -> 8
- Memory 8 -> 32
Impacted Workflows/Tasks
- UShER _PHB is the only affected workflow.
  - The UShER task is used in the UShER workflow.

🔎 Pilon

The runtime environment for the Pilon task has been allocated additional compute resources to allow for larger input sets.
The following defaults for the Pilon task were changed:
- CPU 4 -> 8
- Memory 8 -> 32
Impacted Workflows/Tasks
- TheiaMeta_Illumina_PE_PHB is the only affected workflow.
  - The Pilon task is used in the metaspades_assembly sub-workflow.

🏭 What's Changed

KmerFinder to TheiaProk by @cimendes in #188
Remove the genes and colors input files by @sage-wright in #212
update default nextclade dataset tag to "2023-09-21T12:00:00Z" for all TheiaCov wfs by @kapsakcj in #208
update template and update PHB version by @sage-wright in #217
Update tbp-parser Docker and new output by @sage-wright in #214
Fix bugs in read proportion calculation - read_screen task by @cimendes in #209
Fix bug on Kraken2_PE standalone workflow by @cimendes in #219
Add additional section to the PR template by @sage-wright in #221
Update compute resource defaults in task_usher.wdl by @frankambrosio3 in #222
TheiaMeta - Update Pilon defaults (cpu and memory) by @cimendes in #223

Full Changelog: v1.2.0...v1.2.1

Please see the full documentation for the PHB repository v1.2.1 release.

Contributors

kapsakcj, cimendes, and 2 other contributors

Assets 2

03 Oct 14:02

frankambrosio3

v1.2.0

801baa2

v1.2.0

Public Health Bioinformatics v1.2.0 Release Notes

This minor release introduces three new workflows and resolves various bugs.

New workflows:

TheiaMeta_Illumina_PE_PHB
This workflow offers a versatile approach to de novo metagenomic assembly, providing the option to use either reference-based or reference-independent metagenomic assembly. Taxonomic characterization is also performed with Kraken2.
CZGenEpi_Prep_PHB
The CZGenEpi_Prep workflow formats metadata and assembly files for seamless integration with the Chan Zuckerberg GEN EPI platform.
Samples_to_Ref_Tree_PHB
In this workflow, Nextclade is used to rapidly place new samples onto an existing reference phylogenetic tree. Phylogenetic placement is done by comparing the mutations of the query sequence (relative to the reference) with the mutations of every node and tip in the reference tree, and finding the node which has the most similar set of mutations. This operation is repeated for each query sequence, until all of them are placed onto the tree.

Changes in existing workflows

Kraken2_SE_PHB
Kraken2 output files were not being correctly identified by the single-end standalone workflow, causing it to fail unexpectedly Output files should now populate on the Terra datatable correctly.
KMC
The output type of est_genome_size is now an int so data can be sorted numerically in a Terra datatable when running TheiaProk_ONT. Additionally, this task no longer runs unnecessarily for the TheiaCoV_ONT workflow.
TS_MLST
The database had been updated as of August 2023.

New outputs:
- ts_mlst_docker

Mycobacterium tuberculosis changes

TBProfiler
The default variant caller has been adjusted to FreeBayes to accurately identify resistance-conferring deletions and multi-nucleotide variants (MNVs),
tbp-parser
A TBProfiler parsing module has been added to apply variant interpretation logic based on recommendations by the WHO, CDC and CDPH to produce antitubercular drug resistance calls. Additionally, a set of machine and human-interpretable files are produced to facilitate data sharing and interpretation. Find the source code here.

New inputs:
- tbprofiler_output_seq_method_type (default="WGS")
- tbprofiler_operator (default="")
- tbp_parser_min_depth (default=10)
- tbp_parser_coverage_threshold (default=100)
- tbp_parser_debug (default=false)
- tbp_parser_docker_image (default="us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.0.1")
New outputs:
- tbprofiler_lims_report_csv
- tbprofiler_looker_csv
- tbprofiler_laboratorian_report_csv
- tbprofiler_resistance_genes_percent_coverage
- tbp_parser_genome_percent_coverage
- tbp_parser_version
- tbp_parser_docker
Clockwork
The clockwork module has been added to decontaminate read files of sequencing data that may come from a nontuberculous mycobacteria (NTM) or human genome.

New outputs:
- clockwork_decontaminated_read1
- clockwork_decontaminated_read2
TBDB
The TBProfiler module uses a database called TBDB. We have modified the code to allow for custom databases to be used in place of the default TBDB. Additionally, we have created a custom database including mutations from TBDB, the WHO catalog, and a list of mutations included in the CDC's MTB pipeline Varpipe.

By default, TBProfiler runs with the default database. If the Boolean input tbprofiler_run_custom_db is set to true and no database is provided by the user, a database containing both TBProfiler's TBDB and CDC Varpipe's collection of resistance conferring mutations will be used by TBProfiler. In this database, the duplicate entries have been manually curated by removing the TBDB entry in favor of Varpipe's mutation annotation.

New inputs:
- tbprofiler_run_custom_db (default=false)
- tbprofiler_custom_db (default="gs://theiagen-public-files/terra/theiaprok-files/tbdb_varpipe_combined.tar.gz")

Bug Fixes

In the KMC task, the -n flag has been added to the echo command to avoid newline characters
An optional snippy_core_bed file input has been added to the Snippy_Tree workflow to enable site masking, and thereby exposing this optional input to the Snippy_Streamline workflow.
The memory input for quast has been adjusted to match the style guide in TheiaEuk_Illumina_PE_PHB workflow.
The version_capture task now uses a Docker image hosted on Theiagen's Google Artifact Registry (GAR) instead of DockerHub; we also exposed docker as an optional input for this task.
The plasmidfinder output parsing was overambitious when removing duplicates and removed every instance of a duplicate, instead of just one. This has been resolved.

What's Changed

Create issue templates by @sage-wright in #175
Add preemptibles, shorter version string by @aofarrel in #185
Fix kraken2_standalone for SE data by @cimendes in #178
Patch theiaprok ont - change est_genome_size to Int by @cimendes in #179
plasmidfinder task bugfix and updates by @kapsakcj in #191
TheiaMeta: Viral Metagenomics workflow by @cimendes in #64
adding bed file input by @jrotieno in #190
Jro mpxv global tree by @jrotieno in #160
Adding tbp_parser and clockwork to TheiaProk by @frankambrosio3 in #192
KMC on TheiaProk_ONT and TheiaCoV_ONT by @cimendes in #193
CZGenEpi_Prep_PHB workflow by @sage-wright in #161
update ts mlst docker (August 2023) by @cimendes in #195
TBDB with varpipe by @cimendes in #197
Smw tbprofiler continuing dev by @sage-wright in #199
adjusted call block for quast in theiaeuk_illumina_pe_PHB workflow: m… by @kapsakcj in #200
add -n to echo command in kmc to avoid new line by @frankambrosio3 in #201
switch default docker image for version_capture to GAR-hosted image; CI change to micromamba by @kapsakcj in #198
update version by @sage-wright in #204
revert ncbi scrub changes to commid id 4e0fa54 by @cimendes in #205

Full Changelog: v1.1.0...v1.2.0

View our documentation here!

Contributors

kapsakcj, cimendes, and 4 other contributors

Assets 2

30 Aug 20:17

sage-wright

v1.1.0

87f1695

v1.1.0

Public Health Bioinformatics v1.1.0 Release Notes

This minor release introduces two new workflows, changes the outputs for the ONT workflows, and resolves various bugs.

New workflows:

Terra_2_GISAID
This workflow will submit concatenated metadata and assembly files to GISAID directly from Terra. The user must obtain a GISAID client-id before they can use this workflow.
Usher_PHB
This workflow will place your samples onto the most up-to-date versions of the UCSC's UShER phylogenetic trees and return subtree(s) that the user can visualize.

Major output changes in TheiaCoV_ONT and TheiaProk_ONT workflows

We identified an issue when using cg_pipeline in our ONT workflows that led to inaccurate QC metrics. We have corrected this issue by deprecating the use of cg_pipeline in all ONT workflows. QC metrics are now calculated using nanoplot, which is a tool geared specifically for ONT data. In addition, since fastq-scan is now redundant in these workflows, it has been removed.

Also, the maximum read length in TheiaProk_ONT was previously set to 10,000 base pairs. We have increased this to 100,000 base pairs by default.

TheiaProk_ONT New Outputs
The following columns are new.
- nanoplot_num_reads_clean1
- nanoplot_num_reads_raw1
- nanoplot_r1_mean_q_clean
- nanoplot_r1_mean_q_raw
- nanoplot_r1_mean_readlength_clean
- nanoplot_r1_mean_readlength_raw
- nanoplot_tsv_clean
- nanoplot_tsv_raw
- nanoplot_version
- nanoplot_docker
- nanoplot_html_clean
- nanoplot_html_raw
The following variables are now generated using nanoplot:
- est_coverage_raw
- est_coverage_clean
The following variables have been removed:
- num_reads_clean1
- num_reads_raw1
- r1_mean_q_raw
- r1_mean_readlength_raw
- fastq_scan_version
TheiaCoV_ONT New Outputs
The following columns are new.
- nanoplot_tsv_clean
- nanoplot_tsv_raw
- nanoplot_version
- nanoplot_docker
- nanoplot_html_clean
- nanoplot_html_raw
- est_coverage_raw
- est_coverage_clean
- r1_mean_readlength_clean
- r1_mean_readlength_raw
- r1_mean_q_clean
- r1_mean_q_raw
The following variables are now generated using nanoplot:
- num_reads_clean1
- num_reads_raw1
The following variables have been removed:
- fastq_scan_version

Bug Fixes

Corrected an inaccurate file extension in the augur workflow.
Adjusted several files to meet the style guide
Adjusted the default value for the core_genome input in Snippy_Tree to be true.
Fixed a bug in the summarize_data task
Fixed a bug and added new outputs in the SRA_Fetch workflow
Enabled the skipping of extra header columns in the Concatenate_Column_Content workflow
Added the .gfa file from Dragonflye as output
Updated default docker images and dataset tags for the Pangolin and Nextclade tasks.
Updated the GAMBIT database to v1.1.0
The GAMBIT docker image has been updated to use the latest GAMBIT version
Fixed a bug in file name parsing in the Lyve_Set_PHB workflow
Skipped the genome size estimation in the read_screen task for all ONT workflows.

What's Changed

update default docker for busco to GAR docker image by @kapsakcj in #132
change file extension by @sage-wright in #134
minor mashtree improvements by @kapsakcj in #142
[TheiaProk] expose kleborate_virulence_score and kleborate_resistance_score by @cimendes in #146
Explode workflows by @sage-wright in #135
Usher_PHB by @sage-wright in #149
Snippy_Tree core_genome default value by @sage-wright in #144
summarize_data task bug fix: -z bash conditional by @kapsakcj in #153
SRA_fetch workflow & fastq-dl task improvements by @kapsakcj in #150
Terra_2_GISAID by @sage-wright in #148
Skip extra headers in Concatenate_Column_Content by @sage-wright in #162
Deprecate the use of cg_pipeline for nanoplot stats by @cimendes in #164
Update defaults by @sage-wright in #171
update default gambit docker by @sage-wright in #173
lyveset fastq file parsing bugfix and other improvements by @kapsakcj in #156
update lyveSET FASTQ parsing by @kapsakcj in #177

Full Changelog: v1.0.1...v1.1.0

Contributors

kapsakcj, cimendes, and sage-wright

Assets 2

Releases: theiagen/public_health_bioinformatics

v2.3.0

Public Health Bioinformatics v2.3.0 Minor Release

This minor release adds two new workflows, Fetch_SRR_Accession_PHB and Concatenate_Illumina_Lanes_PHB, and makes significant improvements to the TheiaCoV, TheiaEuk, TheiaProk, and TheiaMeta workflow series. Documentation updates and various bug fixes have also been implemented.

🆕 New workflows

🚀 Changes to existing workflows

📚 Documentation Updates

What's Changed

Contributors

v2.2.1

Public Health Bioinformatics v2.2.1 Patch Release Notes

🩹 This patch release fixes the output names for the NCBI-Scrub standalone workflows.

What's Changed

Contributors

v2.2.0

Public Health Bioinformatics v2.2.0 Minor Release Notes

This minor release adds two new workflows, Create_Terra_Table_PHB and Snippy_Streamline_FASTA_PHB, and makes significant improvements to the TheiaProk, TheiaCoV, TheiaMeta, and Freyja workflow series. Additionally, several bug fixes have been made.

🆕 New workflows:

🚀 Changes to existing workflows:

What's Changed

Contributors

v2.1.0

Public Health Bioinformatics v2.1.0 Minor Release Notes

This minor release improves the utility and usability of several Oxford Nanopore Technologies’ dedicated workflows for viral and bacterial genomic characterization (TheiaCoV and TheiaProk). Additionally, support for new organisms has been added to several workflows.

🚀 Changes to existing workflows:

Docker container updates:

🐛 Bug fixes and small improvements:

What's Changed

Contributors

v2.0.1

Public Health Bioinformatics v2.0.1 Patch Release Notes

🩹 This patch release updates the default midas_db location

What's Changed

Contributors

v2.0.0

Public Health Bioinformatics v2.0.0 Release Notes

🆕 New workflows:

🚀 Changes to existing workflows:

Docker container updates:

🐛 Bug fixes and small improvements:

What's Changed

Contributors

v1.3.0

Public Health Bioinformatics v1.3.0 Release Notes

This minor release introduces two new workflows, improves on several workflows, and resolves various bugs

🆕 New workflows:

🚀 Changes to existing workflows:

Docker container updates:

Tag updates:

🐛 Bug fixes and small improvements:

What's Changed

Contributors

v1.2.1

Public Health Bioinformatics v1.2.1 Release Notes

This patch release resolves various bugs and updates workflow defaults.

🐛 Bug Fixes

🦑 Kraken2_PE

Impacted Workflows/Tasks:

The following workflows uses Kraken2_standalone task but have not been affected as they do not require the affected outputs:

The following workflows use a different Kraken2 task and have not been affected:

🌲 Augur

Users reported issues with with optional Augur_PHB inputs, specifically colors_tsv, with the following error messages:

📚 Read Screen

🔧 Workflows Updates

Workflows

🔬 TheiaCoV Workflows

🦠 TheiaProk Workflows

New Outputs

Task Files

🎙️ UShER

🔎 Pilon

🏭 What's Changed

Contributors

v1.2.0

Public Health Bioinformatics v1.2.0 Release Notes

This minor release introduces three new workflows and resolves various bugs.

New workflows:

Changes in existing workflows

Mycobacterium tuberculosis changes

Bug Fixes