diff --git a/README.md b/README.md index 0eb976b16..84bccba76 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ Many more workflows are available, and are documented in detail in the [Theiagen The PHB repository would not be possible without its predecessors. We would like to acknowledge the following repositories, individuals, and contributors for their influence on the development of these workflows: -The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's [Monroe](https://staph-b.github.io/staphb_toolkit/workflow_docs/monroe/) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions. +The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's Monroe (now deprecated) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions. If you would like to provide feedback, please raise a [GitHub issue](https://github.com/theiagen/public_health_bioinformatics/issues/new). diff --git a/docs/assets/files/input_jsons/TheiaCoV_FASTA_RSV-A_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_FASTA_RSV-A_2024-04-19.json new file mode 100644 index 000000000..049513ebf --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_FASTA_RSV-A_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_fasta.organism":"rsv_a"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_FASTA_RSV-B_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_FASTA_RSV-B_2024-04-19.json new file mode 100644 index 000000000..ecfcd122b --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_FASTA_RSV-B_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_fasta.organism":"rsv_b"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_FASTA_WNV_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_FASTA_WNV_2024-04-19.json new file mode 100644 index 000000000..b70cfc7d7 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_FASTA_WNV_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_fasta.organism":"WNV"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_FASTA_flu_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_FASTA_flu_2024-04-19.json new file mode 100644 index 000000000..387ea28f9 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_FASTA_flu_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_fasta.organism":"flu"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v1_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v1_2024-04-19.json new file mode 100644 index 000000000..b74fdb1da --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v1_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_illumina_pe.organism":"HIV","theiacov_illumina_pe.hiv_primer_version":"v1"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v2_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v2_2024-04-19.json new file mode 100644 index 000000000..5cb283195 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v2_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_illumina_pe.organism":"HIV","theiacov_illumina_pe.hiv_primer_version":"v2"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-A_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-A_2024-04-19.json new file mode 100644 index 000000000..7f0ccbdbe --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-A_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_illumina_pe.organism":"rsv_a"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-B_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-B_2024-04-19.json new file mode 100644 index 000000000..32b139716 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-B_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_illumina_pe.organism":"rsv_b"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_WNV_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_WNV_2024-04-19.json new file mode 100644 index 000000000..c50400a79 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_WNV_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_illumina_pe.organism":"WNV"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_flu_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_flu_2024-04-19.json new file mode 100644 index 000000000..268ef0974 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_Illumina_PE_flu_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_illumina_pe.organism":"flu"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_Illumina_SE_WNV_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_Illumina_SE_WNV_2024-04-19.json new file mode 100644 index 000000000..671680c73 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_Illumina_SE_WNV_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_illumina_se.organism":"WNV"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_ONT_HIV_v1_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_ONT_HIV_v1_2024-04-19.json new file mode 100644 index 000000000..5cb7de3fa --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_ONT_HIV_v1_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_ont.organism":"HIV","theiacov_ont.hiv_primer_version":"v1"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_ONT_HIV_v2_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_ONT_HIV_v2_2024-04-19.json new file mode 100644 index 000000000..789e59083 --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_ONT_HIV_v2_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_ont.organism":"HIV","theiacov_ont.hiv_primer_version":"v2"} \ No newline at end of file diff --git a/docs/assets/files/input_jsons/TheiaCoV_ONT_flu_2024-04-19.json b/docs/assets/files/input_jsons/TheiaCoV_ONT_flu_2024-04-19.json new file mode 100644 index 000000000..de68f509e --- /dev/null +++ b/docs/assets/files/input_jsons/TheiaCoV_ONT_flu_2024-04-19.json @@ -0,0 +1 @@ +{"theiacov_ont.organism":"flu"} \ No newline at end of file diff --git a/docs/contributing/doc_contribution.md b/docs/contributing/doc_contribution.md index 8a32abf56..7f20e5491 100644 --- a/docs/contributing/doc_contribution.md +++ b/docs/contributing/doc_contribution.md @@ -34,6 +34,7 @@ Here are some VSCode Extensions can help you write and edit your markdown files - [Excel to Markdown Table](https://tableconvert.com/excel-to-markdown) - This website will convert an Excel table into markdown format, which can be copied and pasted into your markdown file. - [Material for MkDocs Reference](https://squidfunk.github.io/mkdocs-material/reference/) - This is the official reference for the Material for MkDocs theme, which will help you understand how to use the theme's features. +- [Broken Link Check](https://www.brokenlinkcheck.com/) - This website will scan your website to ensure that all links are working correctly. This will only work on the deployed version of the documentation, not the local version. ## Documentation Structure diff --git a/docs/index.md b/docs/index.md index 94f03bce8..058b2149d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,8 +6,7 @@ title: Home The PHB repository contains workflows for the characterization, genomic epidemiology, and sharing of pathogen genomes of public health concern. Workflows are available for viruses, bacteria, and fungi. -All workflows in the PHB repository end with `_PHB` in order to differentiate them from earlier versions and from the original tools they -incorporate. +All workflows in the PHB repository end with `_PHB` in order to differentiate them from earlier versions and from the original tools they incorporate.
[Explore our workflows](workflows_overview/workflows_type.md){ .md-button .md-button--primary }
@@ -28,7 +27,7 @@ incorporate. !!! dna "Our Open Source Philosophy" - PHB source code is publicly available on [GitHub](https://github.com/theiagen/public_health_bioinformatics) and available under [GNU Affero General Public License v3.0](https://github.com/theiagen/public_health_viral_genomics/blob/main/LICENSE)! + PHB source code is publicly available on [GitHub](https://github.com/theiagen/public_health_bioinformatics) and available under [GNU Affero General Public License v3.0](https://github.com/theiagen/public_health_bioinformatics/blob/main/LICENSE)! All workflows can be imported directly to [Terra](https://terra.bio/) via the [**Dockstore PHB collection**](https://dockstore.org/organizations/Theiagen/collections/public-health-bioinformatics)! @@ -90,7 +89,7 @@ We would like to gratefully acknowledge the following individuals from the publi The PHB repository would not be possible without its predecessors. We would like to acknowledge the following repositories, individuals, and contributors for their influence on the development of these workflows: -The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's [Monroe](https://staph-b.github.io/staphb_toolkit/workflow_docs/monroe/) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions. +The PHB repository originated from collaborative work with Andrew Lang, PhD & his [Genomic Analysis WDL workflows](https://github.com/AndrewLangvt/genomic_analyses). The workflows and task development were influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines) repository. The TheiaCoV workflows for viral genomic characterization were influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's Monroe (now deprecated) workflows. The TheiaProk workflows for bacterial genomic characterization were influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia). Most importantly, the PHB user community drove the development of these workflows and we are grateful for their feedback and contributions. If you would like to provide feedback, please raise a [GitHub issue](https://github.com/theiagen/public_health_bioinformatics/issues/new) or contact us at . diff --git a/docs/workflows/genomic_characterization/theiacov.md b/docs/workflows/genomic_characterization/theiacov.md index 24881de45..5849d2f08 100644 --- a/docs/workflows/genomic_characterization/theiacov.md +++ b/docs/workflows/genomic_characterization/theiacov.md @@ -27,28 +27,28 @@ Additionally, the **TheiaCoV_FASTA_Batch** workflow is available to process seve [**Reference Materials for Mpox**](https://www.notion.so/Workspace-Reference-Materials-for-MPXV-Genomic-Characterization-a34f355c68c54c0a82e926d4de607bca?pvs=21) ??? toggle "HIV Input JSONs" - - [TheiaCoV_Illumina_PE_HIV_v1_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/439f1c74-d91e-4978-b173-3302f878e343/TheiaCoV_Illumina_PE_HIV_v1_2024-04-19.json) - - [TheiaCoV_Illumina_PE_HIV_v2_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/2c7872de-44c8-406d-bbec-fadaacbb0d98/TheiaCoV_Illumina_PE_HIV_v2_2024-04-19.json) - - [TheiaCoV_ONT_HIV_v1_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/9f9a7bd1-2ac4-47fb-967b-4198a45d4a71/TheiaCoV_ONT_HIV_v1_2024-04-19.json) - - [TheiaCoV_ONT_HIV_v2_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/13fdfec0-4a81-460e-948a-be6ad30d022d/TheiaCoV_ONT_HIV_v2_2024-04-19.json) + - [TheiaCoV_Illumina_PE_HIV_v1_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v1_2024-04-19.json) + - [TheiaCoV_Illumina_PE_HIV_v2_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_Illumina_PE_HIV_v2_2024-04-19.json) + - [TheiaCoV_ONT_HIV_v1_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_ONT_HIV_v1_2024-04-19.json) + - [TheiaCoV_ONT_HIV_v2_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_ONT_HIV_v2_2024-04-19.json) ??? toggle "WNV Input JSONs" - - [TheiaCoV_Illumina_PE_WNV_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/6af74d02-9985-428d-897e-e04ebacc42a3/TheiaCoV_Illumina_PE_WNV_2024-04-19.json) - - [TheiaCoV_Illumina_SE_WNV_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/cb8dec19-2563-4070-9ae9-031c089f8b3d/TheiaCoV_Illumina_SE_WNV_2024-04-19.json) - - [TheiaCoV_FASTA_WNV_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/f2059069-5ce1-45e1-ab9e-51925158c0eb/TheiaCoV_FASTA_WNV_2024-04-19.json) + - [TheiaCoV_Illumina_PE_WNV_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_Illumina_PE_WNV_2024-04-19.json) + - [TheiaCoV_Illumina_SE_WNV_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_Illumina_SE_WNV_2024-04-19.json) + - [TheiaCoV_FASTA_WNV_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_FASTA_WNV_2024-04-19.json) ??? toggle "Flu Input JSONs" - - [TheiaCoV_Illumina_PE_flu_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/ba326b69-8a2a-4af2-a74f-e710e667f82b/TheiaCoV_Illumina_PE_flu_2024-04-19.json) - - [TheiaCoV_ONT_flu_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/c01c98f5-d00e-4ff2-ad09-6cc3ff1ad3a7/TheiaCoV_ONT_flu_2024-04-19.json) - - [TheiaCoV_FASTA_flu_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/4c7d7a16-2c20-4cbc-9618-231afade9940/TheiaCoV_FASTA_flu_2024-04-19.json) + - [TheiaCoV_Illumina_PE_flu_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_Illumina_PE_flu_2024-04-19.json) + - [TheiaCoV_ONT_flu_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_ONT_flu_2024-04-19.json) + - [TheiaCoV_FASTA_flu_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_FASTA_flu_2024-04-19.json) ??? toggle "RSV-A Input JSONs" - - [TheiaCoV_Illumina_PE_RSV-B_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/2be20bb8-b733-4f02-a27f-b0cf19d015f8/TheiaCoV_Illumina_PE_RSV-B_2024-04-19.json) - - [TheiaCoV_FASTA_RSV-A_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/ba6a4845-14ee-4664-b9f3-808f76c87d15/TheiaCoV_FASTA_RSV-A_2024-04-19.json) - + - [TheiaCoV_Illumina_PE_RSV-A_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-A_2024-04-19.json) + - [TheiaCoV_FASTA_RSV-A_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_FASTA_RSV-A_2024-04-19.json) + ??? toggle "RSV-B Input JSONs" - - [TheiaCoV_Illumina_PE_RSV-A_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/dd1612ff-20c5-4310-9cb3-c07bf9b7e8a1/TheiaCoV_Illumina_PE_RSV-A_2024-04-19.json) - - [TheiaCoV_FASTA_RSV-B_2024-04-19.json](https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/160cdfbc-a556-40bc-aa05-84ae69511400/TheiaCoV_FASTA_RSV-B_2024-04-19.json) + - [TheiaCoV_Illumina_PE_RSV-B_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_Illumina_PE_RSV-B_2024-04-19.json) + - [TheiaCoV_FASTA_RSV-B_2024-04-19.json](../../assets/files/input_jsons/TheiaCoV_FASTA_RSV-B_2024-04-19.json) !!! caption "TheiaCoV Workflow Diagram" ![TheiaCoV Workflow Diagram](../../assets/figures/TheiaCoV.png) @@ -605,11 +605,11 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/task_versioning.wdl) | + | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/task_versioning.wdl) | ??? task "`screen`: Total Raw Read Quantification and Genome Size Estimation" - The [`screen`](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_screen.wdl) task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses bash commands for quantification of reads and base pairs, and [mash](https://mash.readthedocs.io/en/latest/index.html) sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow: + The [`screen`](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl) task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses bash commands for quantification of reads and base pairs, and [mash](https://mash.readthedocs.io/en/latest/index.html) sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow: 1. Total number of reads: A sample will fail the read screening task if its total number of reads is less than or equal to `min_reads`. 2. The proportion of basepairs reads in the forward and reverse read files: A sample will fail the read screening if fewer than `min_proportion` basepairs are in either the reads1 or read2 files. @@ -638,7 +638,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Task | [task_screen.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_screen.wdl) | + | Task | [task_screen.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl) | ??? task "`read_QC_trim_pe` and `read_QC_trim_se`: Read Quality Trimming, Host and Adapter Removal, Quantification, and Identification ==_for Illumina workflows_==" @@ -654,7 +654,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Task | [task_ncbi_scrub.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_ncbi_scrub.wdl) | + | Task | [task_ncbi_scrub.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_ncbi_scrub.wdl) | | Software Source Code | [NCBI Scrub on GitHub](https://github.com/ncbi/sra-human-scrubber) | | Software Documentation | | @@ -698,7 +698,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Task | [task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl) | | Software Source Code | [Kraken2 on GitHub](https://github.com/DerrickWood/kraken2/) | | Software Documentation | | | Original Publication(s) | [Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | @@ -707,8 +707,8 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/workflows/wf_read_QC_trim.wdl) | - | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilties/wf_read_QC_trim.wdl) | + | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl) | | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | | Original Publication(s) | *[Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
*[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
*[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
*[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | @@ -923,7 +923,7 @@ All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT This sub-workflow determines which, if any, antiviral mutations are present in the sample. - The assembled HA, NA, PA, PB1 and PB2 segments are compared against [a list of known amino-acid substitutions associated with resistance](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/task_flu_antiviral_subs.wdl) to the antivirals A_315675, compound_367, Favipiravir, Fludase, L_742_001, Laninamivir, Oseltamivir (tamiflu), Peramivir, Pimodivir, Xofluza, and Zanamivir. The list of known antiviral amino acid substitutions can be expanded via optional user input `antiviral_aa_subs` in the format "`NA:V95A,HA:I97V`", i.e. `Protein:AAPositionAA`. + The assembled HA, NA, PA, PB1 and PB2 segments are compared against [a list of known amino-acid substitutions associated with resistance](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/drug_resistance/task_flu_antiviral_subs.wdl) to the antivirals A_315675, compound_367, Favipiravir, Fludase, L_742_001, Laninamivir, Oseltamivir (tamiflu), Peramivir, Pimodivir, Xofluza, and Zanamivir. The list of known antiviral amino acid substitutions can be expanded via optional user input `antiviral_aa_subs` in the format "`NA:V95A,HA:I97V`", i.e. `Protein:AAPositionAA`. !!! techdetails "Antiviral Substitutions Technical Details" | | Links | diff --git a/docs/workflows/genomic_characterization/theiaeuk.md b/docs/workflows/genomic_characterization/theiaeuk.md index 265479ad6..a594dce3b 100644 --- a/docs/workflows/genomic_characterization/theiaeuk.md +++ b/docs/workflows/genomic_characterization/theiaeuk.md @@ -183,11 +183,11 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/task_versioning.wdl) | + | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/task_versioning.wdl) | ??? task "`screen`: Total Raw Read Quantification and Genome Size Estimation" - The [`screen`](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_screen.wdl) task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses bash commands for quantification of reads and base pairs, and [mash](https://mash.readthedocs.io/en/latest/index.html) sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow: + The [`screen`](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl) task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses bash commands for quantification of reads and base pairs, and [mash](https://mash.readthedocs.io/en/latest/index.html) sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow: 1. Total number of reads: A sample will fail the read screening task if its total number of reads is less than or equal to `min_reads`. 2. The proportion of basepairs reads in the forward and reverse read files: A sample will fail the read screening if fewer than `min_proportion` basepairs are in either the reads1 or read2 files. @@ -293,11 +293,11 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/workflows/wf_read_QC_trim.wdl) | - | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_trimmomatic.wdl#L3) (PE subtask)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastq_scan.wdl#L3) (PE subtask)
[task_midas.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim.wdl) | + | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl)| | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | - | Original Publication(s) | *[Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
*[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
*[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
*[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | + | Original Publication(s) | [Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | ??? task "`shovill`: _De novo_ Assembly" @@ -311,7 +311,7 @@ All input reads are processed through "core tasks" in each workflow. The core ta !!! techdetails "Shovill Technical Details" | | Links | | --- | --- | - | TheiaEuk WDL Task | [task_shovill.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/assembly/task_shovill.wdl#L3) | + | TheiaEuk WDL Task | [task_shovill.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/assembly/task_shovill.wdl#L3) | | Software code repository and documentation | [Shovill on GitHub](https://github.com/tseemann/shovill) | ??? task "`QUAST`: Assembly Quality Assessment" @@ -351,7 +351,7 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Task | [task_gambit.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_gambit.wdl) | + | Task | [task_gambit.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_gambit.wdl) | | Software Source Code | [GAMBIT on GitHub](https://github.com/jlumpe/gambit) | | Software Documentation | [GAMBIT ReadTheDocs](https://gambit-genomics.readthedocs.io/en/latest/) | | Original Publication(s) | [GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277575) | @@ -379,7 +379,7 @@ All input reads are processed through "core tasks" in each workflow. The core ta | | Links | | --- | --- | - | Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_qc_check.wdl) | + | Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_qc_check.wdl) | ### Organism-specific Characterization diff --git a/docs/workflows/genomic_characterization/theiameta.md b/docs/workflows/genomic_characterization/theiameta.md index 8d0d5c301..6ae592ad8 100644 --- a/docs/workflows/genomic_characterization/theiameta.md +++ b/docs/workflows/genomic_characterization/theiameta.md @@ -131,7 +131,7 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge | | Links | | --- | --- | - | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/task_versioning.wdl) | + | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/task_versioning.wdl) | #### Read Cleaning and QC @@ -213,11 +213,11 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/workflows/wf_read_QC_trim.wdl) | - | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_trimmomatic.wdl#L3) (PE subtask)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastq_scan.wdl#L3) (PE subtask)
[task_midas.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim.wdl) | + | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl)| | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | - | Original Publication(s) | *[Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
*[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
*[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
*[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | + | Original Publication(s) | [Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | ??? task "`kraken`: Taxonomic Classification" @@ -268,7 +268,7 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge | --- | --- | | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_quast.wdl) | | Software Source Code | [QUAST on GitHub](https://github.com/ablab/quast) | - | Software Documentation | | + | Software Documentation | | | Original Publication(s) | [QUAST: quality assessment tool for genome assemblies](https://academic.oup.com/bioinformatics/article/29/8/1072/228832) | #### Binning diff --git a/docs/workflows/genomic_characterization/theiaprok.md b/docs/workflows/genomic_characterization/theiaprok.md index f5623f89c..3e9329a3b 100644 --- a/docs/workflows/genomic_characterization/theiaprok.md +++ b/docs/workflows/genomic_characterization/theiaprok.md @@ -582,11 +582,11 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/task_versioning.wdl) | + | Task | [task_versioning.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/task_versioning.wdl) | ??? task "`screen`: Total Raw Read Quantification and Genome Size Estimation" - The [`screen`](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_screen.wdl) task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses bash commands for quantification of reads and base pairs, and [mash](https://mash.readthedocs.io/en/latest/index.html) sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow: + The [`screen`](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl) task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses bash commands for quantification of reads and base pairs, and [mash](https://mash.readthedocs.io/en/latest/index.html) sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow: 1. Total number of reads: A sample will fail the read screening task if its total number of reads is less than or equal to `min_reads`. 2. The proportion of basepairs reads in the forward and reverse read files: A sample will fail the read screening if fewer than `min_proportion` basepairs are in either the reads1 or read2 files. @@ -613,7 +613,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | TheiaProk_Illumina_PE | TheiaProk_Illumina_SE and TheiaProk_ONT | | --- | --- | --- | - | Task | [task_screen.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_screen.wdl#L3) (PE sub-task) | [task_screen.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_screen.wdl#L147) (SE sub-task) | + | Task | [task_screen.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl#L3) (PE sub-task) | [task_screen.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_screen.wdl#L147) (SE sub-task) | #### Illumina Data Core Tasks @@ -685,11 +685,11 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/workflows/wf_read_QC_trim.wdl) | - | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_kraken2.wdl) | + | Sub-workflow | [wf_read_QC_trim.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim.wdl) | + | Tasks | [task_fastp.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_fastp.wdl)
[task_trimmomatic.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_trimmomatic.wdl)
[task_bbduk.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_bbduk.wdl)
[task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_fastq_scan.wdl)
[task_midas.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_midas.wdl)
[task_kraken2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kraken2.wdl)| | Software Source Code | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](https://github.com/usadellab/Trimmomatic); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2)| | Software Documentation | [fastp](https://github.com/OpenGene/fastp); [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic); [BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/); [fastq-scan](https://github.com/rpetit3/fastq-scan); [MIDAS](https://github.com/snayfach/MIDAS); [Kraken2](https://github.com/DerrickWood/kraken2/wiki) | - | Original Publication(s) | *[Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
*[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
*[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
*[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | + | Original Publication(s) | [Trimmomatic: a flexible trimmer for Illumina sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/)
[fastp: an ultra-fast all-in-one FASTQ preprocessor](https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=false)
[An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography](https://pubmed.ncbi.nlm.nih.gov/27803195/)
[Improved metagenomic analysis with Kraken 2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0) | ??? task "`CG-Pipeline`: Assessment of Read Quality, and Estimation of Genome Coverage" @@ -717,7 +717,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al !!! techdetails "Shovill Technical Details" | | Links | | --- | --- | - | TheiaProk WDL Task | [task_shovill.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/assembly/task_shovill.wdl#L3) | + | TheiaProk WDL Task | [task_shovill.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/assembly/task_shovill.wdl#L3) | | Software code repository and documentation | [Shovill on GitHub](https://github.com/tseemann/shovill) | #### ONT Data Core Tasks @@ -747,7 +747,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | Workflow | **TheiaProk_ONT** | | --- | --- | | Sub-workflow | [wf_read_QC_trim_ont.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_read_QC_trim_ont.wdl) | - | Tasks | [task_nanoplot.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_nanoplot.wdl) [task_fastq_scan.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/b481ce48f3d527ab8f31e4ad8171769212cc091a/tasks/quality_control/task_fastq_scan.wdl#L3) [task_rasusa.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/task_rasusa.wdl) [task_nanoq.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_nanoq.wdl) [task_tiptoft.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/task_tiptoft.wdl) | + | Tasks | [task_nanoplot.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_nanoplot.wdl) [task_fastq_scan.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/b481ce48f3d527ab8f31e4ad8171769212cc091a/tasks/quality_control/basic_statistics/task_fastq_scan.wdl) [task_rasusa.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/utilities/task_rasusa.wdl) [task_nanoq.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/read_filtering/task_nanoq.wdl) [task_tiptoft.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/plasmid_detection/task_tiptoft.wdl) | | Software Source Code | [fastq-scan](https://github.com/rpetit3/fastq-scan), [NanoPlot](https://github.com/wdecoster/NanoPlot), [RASUSA](https://github.com/mbhall88/rasusa), [tiptoft](https://github.com/andrewjpage/tiptoft), [nanoq](https://github.com/esteinig/nanoq) | | Original Publication(s) | [NanoPlot paper](https://academic.oup.com/bioinformatics/article/39/5/btad311/7160911)
[RASUSA paper](https://doi.org/10.21105/joss.03941)
[Nanoq Paper](https://doi.org/10.21105/joss.02991)
[Tiptoft paper](https://doi.org/10.21105/joss.01021) | @@ -755,7 +755,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al !!! techdetails "dragonflye Technical Details" | | Links | | --- | --- | - | Task | [task_dragonflye.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/assembly/task_dragonflye.wdl) | + | Task | [task_dragonflye.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/assembly/task_dragonflye.wdl) | | Software Source Code | [dragonflye on GitHub](https://github.com/rpetit3/dragonflye) | | Software Documentation | [dragonflye on GitHub](https://github.com/rpetit3/dragonflye) | @@ -769,7 +769,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/task_quast.wdl) | + | Task | [task_quast.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/basic_statistics/task_quast.wdl) | | Software Source Code | [QUAST on GitHub](https://github.com/ablab/quast) | | Software Documentation | | | Original Publication(s) | [QUAST: quality assessment tool for genome assemblies](https://academic.oup.com/bioinformatics/article/29/8/1072/228832) | @@ -804,7 +804,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al Average Nucleotide Identity (ANI) is a useful approach for taxonomic identification. The higher the percentage ANI of a query sequence to a given reference genome, the more likely the sequence is the same taxa as the reference. - ANI is calculated in TheiaProk using [a perl script written by Lee Katz](https://github.com/lskatz/ani-m) ([ani-m.pl](http://ani-m.pl)). This uses [MUMmer](http://mummer.sourceforge.net/) to rapidly align entire query assemblies to one or more reference genomes. By default, TheiaProk uses a set of 43 reference genomes in [RGDv2](https://github.com/StaPH-B/docker-builds/blob/master/fastani/1.34-RGDV2/RGDv2-metadata.tsv), a database containing genomes of enteric pathogens commonly sequenced by CDC EDLB & PulseNet participating laboratories. The user may also provide their own reference genome. After genome alignment with MUMmer, ani-m.pl calculates the average nucleotide identity and percent bases aligned between 2 genomes (query and reference genomes) + ANI is calculated in TheiaProk using [a perl script written by Lee Katz](https://github.com/lskatz/ani-m) (ani-m.pl). This uses [MUMmer](http://mummer.sourceforge.net/) to rapidly align entire query assemblies to one or more reference genomes. By default, TheiaProk uses a set of 43 reference genomes in [RGDv2](https://github.com/StaPH-B/docker-builds/blob/master/fastani/1.34-RGDV2/RGDv2-metadata.tsv), a database containing genomes of enteric pathogens commonly sequenced by CDC EDLB & PulseNet participating laboratories. The user may also provide their own reference genome. After genome alignment with MUMmer, ani-m.pl calculates the average nucleotide identity and percent bases aligned between 2 genomes (query and reference genomes) The default database of reference genomes used is called "Reference Genome Database version 2" AKA "RGDv2". This database is composed of 43 enteric bacteria representing 32 species and is intended for identification of enteric pathogens and common contaminants. It contains six Campylobacter spp., three Escherichia/Shigella spp., one *Grimontia hollisae*, six *Listeria spp.*, one *Photobacterium damselae*, two *Salmonella spp.*, and thirteen *Vibrio spp.* @@ -819,7 +819,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_mummer_ani.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_mummer_ani.wdl) | + | Task | [task_mummer_ani.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/advanced_metrics/task_mummer_ani.wdl) | | Software Source Code | [ani-m](https://github.com/lskatz/ani-m), [MUMmer](https://github.com/mummer4/mummer) | | Software Documentation | [ani-m](https://github.com/lskatz/ani-m), [MUMmer](https://mummer.sourceforge.net/) | | Original Publication(s) | [MUMmer4: A fast and versatile genome alignment system](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005944) | @@ -835,7 +835,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_gambit.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_gambit.wdl) | + | Task | [task_gambit.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_gambit.wdl) | | Software Source Code | [GAMBIT on GitHub](https://github.com/jlumpe/gambit) | | Software Documentation | [GAMBIT ReadTheDocs](https://gambit-genomics.readthedocs.io/en/latest/) | | Original Publication(s) | [GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277575) | @@ -848,7 +848,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_kmerfinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_kmerfinder.wdl) | + | Task | [task_kmerfinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/contamination/task_kmerfinder.wdl) | | Software Source Code | https://bitbucket.org/genomicepidemiology/kmerfinder | | Software Documentation | https://cge.food.dtu.dk/services/KmerFinder/instructions.php | | Original Publication(s) | [**Benchmarking of Methods for Genomic Taxonomy**](https://journals.asm.org/doi/full/10.1128/jcm.02981-13?rfr_dat=cr_pub++0pubmed&url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org) | @@ -865,7 +865,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_amrfinderplus.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/gene_typing/task_amrfinderplus.wdl) | + | Task | [task_amrfinderplus.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/drug_resistance/task_amrfinderplus.wdl) | | Software Source Code | [amr on GitHub](https://github.com/ncbi/amr) | | Software Documentation | https://github.com/ncbi/amr/wiki | | Original Publication(s) | [AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208984/) | @@ -917,7 +917,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_resfinder.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/eld-docs-dev/tasks/gene_typing/task_resfinder.wdl) | + | Task | [task_resfinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/drug_resistance/task_resfinder.wdl) | | Software Source Code | https://bitbucket.org/genomicepidemiology/resfinder/src/master/ | | Software Documentation | https://bitbucket.org/genomicepidemiology/resfinder/src/master/ | | ResFinder database | https://bitbucket.org/genomicepidemiology/resfinder_db/src/master/ | @@ -965,7 +965,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al !!! techdetails "TS_MLST Technical Details" | | Links | | --- | --- | - | Task | [task_ts_mlst.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_ts_mlst.wdl) | + | Task | [task_ts_mlst.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/multi/task_ts_mlst.wdl) | | Software Source Code | [mlst](https://github.com/tseemann/mlst) | | Software Documentation | [mlst](https://github.com/tseemann/mlst) | @@ -978,7 +978,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al !!! techdetails "Prokka Technical Details" | | Links | | --- | --- | - | Task | [task_prokka.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/eld-docs-dev/tasks/gene_typing/task_prokka.wdl) | + | Task | [task_prokka.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/annotation/task_prokka.wdl) | | Software Source Code | [prokka](https://github.com/tseemann/prokka) | | Software Documentation | [prokka](https://github.com/tseemann/prokka) | | Original Publication(s) | [Prokka: rapid prokaryotic genome annotation](https://academic.oup.com/bioinformatics/article/30/14/2068/2390517?login=false) | @@ -993,7 +993,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_bakta.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/gene_typing/task_bakta.wdl) | + | Task | [task_bakta.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/annotation/task_bakta.wdl) | | Software Source Code | [bakta](https://github.com/oschwengers/bakta) | | Software Documentation | | | Original Publication(s) | [Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000685) | @@ -1010,7 +1010,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_plasmidfinder.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/eld-docs-dev/tasks/gene_typing/task_plasmidfinder.wdl) | + | Task | [task_plasmidfinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/plasmid_typing/task_plasmidfinder.wdl) | | Software Source Code | https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/ | | Software Documentation | https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/ | | Original Publication(s) | [In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4068535/) | @@ -1039,7 +1039,7 @@ All input reads are processed through "[core tasks](#core-tasks-performed-for-al | | Links | | --- | --- | - | Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_qc_check.wdl) | + | Task | [task_qc_check.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/quality_control/comparisons/task_qc_check.wdl) | ??? task "`Taxon Tables`: Copy outputs to new data tables based on taxonomic assignment (optional)" @@ -1097,7 +1097,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_kaptive.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_kaptive.wdl) | + | Task | [task_kaptive.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/acinetobacter/task_kaptive.wdl) | | Software Source Code | [Kaptive on GitHub](https://github.com/katholt/Kaptive/wiki) | | Software Documentation | https://github.com/katholt/Kaptive/wiki | | Orginal publications | [Identification of Acinetobacter baumannii loci for capsular polysaccharide (KL) and lipooligosaccharide outer core (OCL) synthesis in genome assemblies using curated reference databases compatible with Kaptive](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000339)
[An update to the database for Acinetobacter baumannii capsular polysaccharide locus typing extends the extensive and diverse repertoire of genes found at and outside the K locus](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000878) | @@ -1106,13 +1106,13 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after *Acinetobacter* plasmids are not included in the [PlasmidFinder](https://www.notion.so/TheiaProk-Workflow-Series-68c34aca2a0240ef94fef0acd33651b9?pvs=21) database. Instead, the [AcinetobacterPlasmidTyping](https://github.com/MehradHamidian/AcinetobacterPlasmidTyping) database contains variants of the plasmid *rep* gene for *A. baumannii* plasmid identification. When matched with >/= 95 % identity, this represents a typing scheme for *Acinetobacter baumannii* plasmids. In TheiaProk, we use the tool [abricate](https://github.com/tseemann/abricate) to query our assemblies against this database. - The bioinformatics software for querying sample assemblies against the AcinetobacterPlasmidTyping database is [Abricate](https://github.com/tseemann/abricate). The WDL task simply runs abricate, and the Acinetobacter Plasmid database and default setting of 95% minimum identity are set in the [merlin magic sub-workflow](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/workflows/wf_merlin_magic.wdl#L44). + The bioinformatics software for querying sample assemblies against the AcinetobacterPlasmidTyping database is [Abricate](https://github.com/tseemann/abricate). The WDL task simply runs abricate, and the Acinetobacter Plasmid database and default setting of 95% minimum identity are set in the [merlin magic sub-workflow](https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_merlin_magic.wdl). !!! techdetails "AcinetobacterPlasmidTyping Technical Details" | | Links | | --- | --- | - | Task | [task_abricate.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/gene_typing/task_abricate.wdl) | + | Task | [task_abricate.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/gene_typing/drug_resistance/task_abricate.wdl) | | Database and documentation | [https://github.com/MehradHamidian/AcinetobacterPlasmidTyping](https://github.com/MehradHamidian/AcinetobacterPlasmidTyping/tree/v1.0.0) | | Software Source Code and documentation | [abricate on GitHub](https://github.com/tseemann/abricate) | | Original Publication(s) | [Detection and Typing of Plasmids in *Acinetobacter baumannii* Using *rep* Genes Encoding Replication Initiation Proteins](https://journals.asm.org/doi/10.1128/spectrum.02478-22?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed) | @@ -1142,7 +1142,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_serotypefinder.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_serotypefinder.wdl) | + | Task | [task_serotypefinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/escherichia_shigella/task_serotypefinder.wdl) | | Software Source Code | https://bitbucket.org/genomicepidemiology/serotypefinder/src/master/ | | Software Documentation | https://bitbucket.org/genomicepidemiology/serotypefinder/src/master/ | | Original Publication(s) | [Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data](https://journals.asm.org/doi/10.1128/JCM.00008-15) | @@ -1155,7 +1155,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_ectyper.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_ectyper.wdl) | + | Task | [task_ectyper.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/escherichia_shigella/task_ectyper.wdl) | | Software Source Code | [ECTyper on GitHub](https://github.com/phac-nml/ecoli_serotyping) | | Software Documentation | [ECTyper on GitHub](https://github.com/phac-nml/ecoli_serotyping) | | Orginal publication | [ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8767331/) | @@ -1170,7 +1170,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_virulencefinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/task_virulencefinder.wdl) | + | Task | [task_virulencefinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/escherichia_shigella/task_virulencefinder.wdl) | | Software Source Code | [**VirulenceFinder**](https://bitbucket.org/genomicepidemiology/virulencefinder/src/master/) | | Software Documentation | [**VirulenceFinder**](https://bitbucket.org/genomicepidemiology/virulencefinder/src/master/) | | Original Publication(s) | [Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia co](https://pubmed.ncbi.nlm.nih.gov/24574290/) | @@ -1185,7 +1185,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_shigatyper.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_shigatyper.wdl) | + | Task | [task_shigatyper.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/escherichia_shigella/task_shigatyper.wdl) | | Software Source Code | [ShigaTyper on GitHub](https://github.com/CFSAN-Biostatistics/shigatyper) | | Software Documentation | https://github.com/CFSAN-Biostatistics/shigatyper | | Origin publication | [In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification](https://doi.org/10.1128/AEM.00165-19) | @@ -1198,7 +1198,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_shigeifinder.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/42659de535202cc37ab509b38987406ad0360dd4/tasks/species_typing/task_shigeifinder.wdl#L3) | + | Task | [task_shigeifinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/escherichia_shigella/task_shigeifinder.wdl) | | Software Source Code | [ShigEiFinder on GitHub](https://github.com/LanLab/ShigEiFinder) | | Software Documentation | [ShigEiFinder on GitHub](https://github.com/LanLab/ShigEiFinder) | | Origin publication | [Cluster-specific gene markers enhance Shigella and enteroinvasive Escherichia coli in silico serotyping](https://pubmed.ncbi.nlm.nih.gov/34889728/) | @@ -1210,7 +1210,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after !!! techdetails "ShigEiFinder_reads Technical Details" | | Links | | --- | --- | - | Task | [task_shigeifinder.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/42659de535202cc37ab509b38987406ad0360dd4/tasks/species_typing/task_shigeifinder.wdl#L81) | + | Task | [task_shigeifinder.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/escherichia_shigella/task_shigeifinder.wdl#L81) | | Software Source Code | [ShigEiFinder on GitHub](https://github.com/LanLab/ShigEiFinder) | | Software Documentation | [ShigEiFinder on GitHub](https://github.com/LanLab/ShigEiFinder) | | Origin publication | [Cluster-specific gene markers enhance Shigella and enteroinvasive Escherichia coli in silico serotyping](https://pubmed.ncbi.nlm.nih.gov/34889728/) | @@ -1227,7 +1227,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_sonneityping.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_sonneityping.wdl) | + | Task | [task_sonneityping.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/escherichia_shigella/task_sonneityping.wdl) | | Software Source Code | [Mykrobe](https://github.com/Mykrobe-tools/mykrobe), [sonneityping](https://github.com/katholt/sonneityping) | | Software Documentation | https://github.com/Mykrobe-tools/mykrobe/wiki, [sonneityping](https://github.com/katholt/sonneityping) | | Original Publication(s) | [Global population structure and genotyping framework for genomic surveillance of the major dysentery pathogen, *Shigella sonnei*](https://www.nature.com/articles/s41467-021-22700-4) | @@ -1247,7 +1247,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_hicap.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/task_hicap.wdl) | + | Task | [task_hicap.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/haemophilus/task_hicap.wdl) | | Software Source Code | [hicap on GitHub](https://github.com/scwatts/hicap) | | Software Documentation | [hicap on GitHub](https://github.com/scwatts/hicap) | | Original Publication(s) | [hicap: In Silico Serotyping of the Haemophilus influenzae Capsule Locus](https://doi.org/10.7717/peerj.5261) | @@ -1262,7 +1262,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_kleborate.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_kleborate.wdl) | + | Task | [task_kleborate.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/haemophilus/task_kleborate.wdl) | | Software Source Code | [kleborate on GitHub](https://github.com/katholt/Kleborate) | | Software Documentation | https://github.com/katholt/Kleborate/wiki | | Orginal publication | [A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex](https://www.nature.com/articles/s41467-021-24448-3)
[Identification of Klebsiella capsule synthesis loci from whole genome data](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000102) | @@ -1277,7 +1277,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_legsta.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_legsta.wdl) | + | Task | [task_legsta.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/legionella/task_legsta.wdl) | | Software Source Code | [Legsta](https://github.com/tseemann/legsta) | | Software Documentation | [Legsta](https://github.com/tseemann/legsta) | @@ -1291,7 +1291,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_lissero.wd](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_lissero.wdl) | + | Task | [task_lissero.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/listeria/task_lissero.wdl) | | Software Source Code | [LisSero](https://github.com/MDU-PHL/LisSero) | | Software Documentation | [LisSero](https://github.com/MDU-PHL/LisSero) | @@ -1349,7 +1349,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_ngmaster.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_ngmaster.wdl) | + | Task | [task_ngmaster.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/neisseria/task_ngmaster.wdl) | | Software Source Code | [ngmaster](https://github.com/MDU-PHL/ngmaster) | | Software Documentation | [ngmaster](https://github.com/MDU-PHL/ngmaster) | | Original Publication(s) | [NGMASTER: *in silico* multi-antigen sequence typing for *Neisseria gonorrhoeae*](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000076) | @@ -1362,7 +1362,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_meningotype.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_meningotype.wdl) | + | Task | [task_meningotype.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/neisseria/task_meningotype.wdl) | | Software Source Code | [meningotype](https://github.com/MDU-PHL/meningotype) | | Software Documentation | [meningotype](https://github.com/MDU-PHL/meningotype) | @@ -1377,7 +1377,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_pasty.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_pasty.wdl) | + | Task | [task_pasty.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/pseudomonas/task_pasty.wdl) | | Software Source Code | [pasty](https://github.com/rpetit3/pasty) | | Software Documentation | [pasty](https://github.com/rpetit3/pasty) | | Original Publication(s) | [Application of Whole-Genome Sequencing Data for O-Specific Antigen Analysis and In Silico Serotyping of Pseudomonas aeruginosa Isolates.](https://journals.asm.org/doi/10.1128/JCM.00349-16) | @@ -1395,7 +1395,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_sistr.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_sistr.wdl) | + | Task | [task_sistr.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/salmonella/task_sistr.wdl) | | Software Source Code | [SISTR](https://github.com/phac-nml/sistr_cmd) | | Software Documentation | [SISTR](https://github.com/phac-nml/sistr_cmd) | | Original Publication(s) | [The Salmonella In Silico Typing Resource (SISTR): an open web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies.](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0147101) | @@ -1407,7 +1407,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after !!! techdetails "SeqSero2 Technical Details" | | Links | | --- | --- | - | Task | [task_seqsero2.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_seqsero2.wdl) | + | Task | [task_seqsero2.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/salmonella/task_seqsero2.wdl) | | Software Source Code | [SeqSero2](https://github.com/denglab/SeqSero2) | | Software Documentation | [SeqSero2](https://github.com/denglab/SeqSero2) | | Original Publication(s) | [Salmonella serotype determination utilizing high-throughput genome sequencing data.](https://journals.asm.org/doi/10.1128/JCM.00323-15)
[SeqSero2: rapid and improved Salmonella serotype determination using whole genome sequencing data.](https://journals.asm.org/doi/10.1128/AEM.01746-19) | @@ -1422,7 +1422,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_genotyphi.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_genotyphi.wdl) | + | Task | [task_genotyphi.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/salmonella/task_genotyphi.wdl) | | Software Source Code | [genotyphi](https://github.com/katholt/genotyphi) | | Software Documentation | https://github.com/katholt/genotyphi/blob/main/README.md#mykrobe-implementation | | Orginal publication | [An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid](https://www.nature.com/articles/ncomms12827/)
[Five Years of GenoTyphi: Updates to the Global Salmonella Typhi Genotyping Framework](https://academic.oup.com/jid/article/224/Supplement_7/S775/6358992?login=false) | @@ -1438,7 +1438,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_spatyper.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_spatyper.wdl) | + | Task | [task_spatyper.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/staphylococcus/task_spatyper.wdl) | | Software Source Code | [spatyper](https://github.com/HCGB-IGTP/spaTyper) | | Software Documentation | [spatyper](https://github.com/HCGB-IGTP/spaTyper) | @@ -1450,7 +1450,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_staphopiasccmec.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_staphopiasccmec.wdl) | + | Task | [task_staphopiasccmec.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/staphylococcus/task_staphopiasccmec.wdl) | | Software Source Code | [staphopia-sccmec](https://github.com/staphopia/staphopia-sccmec) | | Software Documentation | [staphopia-sccmec](https://github.com/staphopia/staphopia-sccmec) | | Original Publication(s) | [*Staphylococcus aureus* viewed from the perspective of 40,000+ genomes](https://doi.org/10.7717/peerj.5261) | @@ -1463,7 +1463,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_agrvate.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_agrvate.wdl) | + | Task | [task_agrvate.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/staphylococcus/task_agrvate.wdl) | | Software Source Code | [agrVATE](https://github.com/VishnuRaghuram94/AgrVATE) | | Software Documentation | [agrVATE](https://github.com/VishnuRaghuram94/AgrVATE) | | Original Publication(s) | [Species-Wide Phylogenomics of the *Staphylococcus aureus Agr* Operon Revealed Convergent Evolution of Frameshift Mutations](https://doi.org/10.1128/spectrum.01334-21) | @@ -1485,7 +1485,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_poppunk_streppneumo.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_poppunk_streppneumo.wdl) | + | Task | [task_poppunk_streppneumo.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/streptococcus/task_poppunk_streppneumo.wdl) | | GPSC database | https://www.pneumogen.net/gps/training_command_line.html | | Software Source Code | [PopPunk](https://github.com/bacpop/PopPUNK) | | Software Documentation | https://poppunk.readthedocs.io/en/latest/ | @@ -1499,7 +1499,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_seroba.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_seroba.wdl) | + | Task | [task_seroba.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/streptococcus/task_seroba.wdl) | | Software Source Code | [SeroBA](https://github.com/sanger-pathogens/seroba) | | Software Documentation | https://sanger-pathogens.github.io/seroba/ | | Original Publication(s) | [SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000186) | @@ -1512,7 +1512,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_pbptyper.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_pbptyper.wdl) | + | Task | [task_pbptyper.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/streptococcus/task_pbptyper.wdl) | | Software Source Code | [pbptyper](https://github.com/rpetit3/pbptyper) | | Software Documentation | [pbptyper](https://github.com/rpetit3/pbptyper) | | Original Publication(s) | [Penicillin-binding protein transpeptidase signatures for tracking and predicting β-lactam resistance levels in Streptococcus pneumoniae](https://journals.asm.org/doi/full/10.1128/mBio.00756-16) | @@ -1526,7 +1526,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after !!! techdetails "emm-typing-tool Technical Details" | | Links | | --- | --- | - | Task | [task_emmtypingtool.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/task_emmtypingtool.wdl) | + | Task | [task_emmtypingtool.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/streptococcus/task_emmtypingtool.wdl) | | Software Source Code | [emm-typing-tool](https://github.com/ukhsa-collaboration/emm-typing-tool) | | Software Documentation | [emm-typing-tool](https://github.com/ukhsa-collaboration/emm-typing-tool) | @@ -1550,7 +1550,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after | | Links | | --- | --- | - | Task | [task_srst2_vibrio.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_srst2_vibrio.wdl) | + | Task | [task_srst2_vibrio.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/vibrio/task_srst2_vibrio.wdl) | | Software Source Code | [srst2](https://github.com/katholt/srst2) | | Software Documentation | [srst2](https://github.com/katholt/srst2) | | Database Description | [Docker container](https://github.com/StaPH-B/docker-builds/tree/master/srst2/0.2.0-vibrio-230224) | @@ -1572,7 +1572,7 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after !!! techdetails "Abricate Technical Details" | | Links | | --- | --- | - | Task | [task_abricate_vibrio.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/species_typing/task_srst2_vibrio.wdl) | + | Task | [task_abricate_vibrio.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/species_typing/vibrio/task_srst2_vibrio.wdl) | | Software Source Code | [abricate](https://github.com/tseemann/abricate) | | Software Documentation | [abricate](https://github.com/tseemann/abricate) | | Database Description | [Docker container](https://github.com/StaPH-B/docker-builds/tree/master/abricate/1.0.1-vibrio-cholera) | diff --git a/docs/workflows/phylogenetic_construction/augur.md b/docs/workflows/phylogenetic_construction/augur.md index 7023d6f1f..29e1b5140 100644 --- a/docs/workflows/phylogenetic_construction/augur.md +++ b/docs/workflows/phylogenetic_construction/augur.md @@ -179,7 +179,7 @@ This workflow runs on the set level. Please note that for every task, runtime pa | augur | **min_num_unambig** | Int | Minimum number of called bases in genome to pass prefilter | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: . For an organism without set defaults, the default value is 0 | Optional | | augur | **organism** | String | Organism used to preselect default values; options: "sars-cov-2", "flu", "mpxv", "rsv-a", "rsv-b" | sars-cov-2 | Optional | | augur | **reference_fasta** | File | The reference FASTA file used to align the genomes and build the trees | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: . For an organism without set defaults, a reference fasta file must be provided otherwise the workflow fails. | Optional, Required | -| augur | **reference_genbank** | File | The GenBank .gb file for the same reference genome used for the reference_fasta | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: l. For an organism without set defaults, a reference genbank file must be provided otherwise the workflow fails. | Optional, Required | +| augur | **reference_genbank** | File | The GenBank .gb file for the same reference genome used for the reference_fasta | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: . For an organism without set defaults, a reference genbank file must be provided otherwise the workflow fails. | Optional, Required | | augur | **sample_metadata_tsvs** | Array[File] | An array of the metadata files produced in Augur_Prep_PHB | | Optional | | augur | **build_name_updated** | String | Internal component, do not modify. Used for replacing spaces with underscores _ | | Do Not Modify | | augur_align | **fill_gaps** | Boolean | If true, gaps represent missing data rather than true indels and so are replaced by N after aligning. | FALSE | Optional | diff --git a/docs/workflows/phylogenetic_construction/snippy_streamline.md b/docs/workflows/phylogenetic_construction/snippy_streamline.md index 7c2fa47df..2423fd959 100644 --- a/docs/workflows/phylogenetic_construction/snippy_streamline.md +++ b/docs/workflows/phylogenetic_construction/snippy_streamline.md @@ -147,7 +147,7 @@ For automatic reference selection by the workflow (optional): | | Links | | --- | --- | - | Task | [task_centroid.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/task_centroid.wdl) | + | Task | [task_centroid.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/utilities/task_centroid.wdl) | | Software Source Code | | | Software Documentation | | diff --git a/docs/workflows/phylogenetic_construction/snippy_tree.md b/docs/workflows/phylogenetic_construction/snippy_tree.md index e33a9ab50..86a19304c 100644 --- a/docs/workflows/phylogenetic_construction/snippy_tree.md +++ b/docs/workflows/phylogenetic_construction/snippy_tree.md @@ -140,7 +140,7 @@ Sequencing data used in the Snippy_Tree workflow must: | | Links | | --- | --- | - | Task | [task_snippy_core.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/task_snippy_core.wdl) | + | Task | [task_snippy_core.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/utilities/task_snippy_core.wdl) | | Default software version | v4.6.0 (us-docker.pkg.dev/general-theiagen/staphb/snippy:4.6.0) | | Software Source Code | [Snippy on GitHub](https://github.com/tseemann/snippy) | | Software Documentation | [Snippy on GitHub](https://github.com/tseemann/snippy) | @@ -185,7 +185,7 @@ Sequencing data used in the Snippy_Tree workflow must: | | Links | | --- | --- | - | Task | [task_snp_sites.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/task_snp_sites.wdl) | + | Task | [task_snp_sites.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/utilities/task_snp_sites.wdl) | | Default software version | 2.5.1 (us-docker.pkg.dev/general-theiagen/biocontainers/snp-sites:2.5.1--hed695b0_0) | | Software Source Code | [SNP-sites on GitHub](https://github.com/sanger-pathogens/snp-sites) | | Software Documentation | [SNP-sites on GitHub](https://github.com/sanger-pathogens/snp-sites) | @@ -230,7 +230,7 @@ Sequencing data used in the Snippy_Tree workflow must: | | Links | | --- | --- | - | Task | [task_snp_dists.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/task_snp_dists.wdl) | + | Task | [task_snp_dists.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/phylogenetic_inference/utilities/task_snp_dists.wdl) | | Default software version | 0.8.2 (us-docker.pkg.dev/general-theiagen/staphb/snp-dists:0.8.2) | | Software Source Code | [SNP-dists on GitHub](https://github.com/tseemann/snp-dists) | | Software Documentation | [SNP-dists on GitHub](https://github.com/tseemann/snp-dists) | diff --git a/docs/workflows/standalone/gambit_query.md b/docs/workflows/standalone/gambit_query.md index b49f76083..5d6dd3c44 100644 --- a/docs/workflows/standalone/gambit_query.md +++ b/docs/workflows/standalone/gambit_query.md @@ -33,7 +33,7 @@ For additional details regarding the GAMBIT tool and a list of available GAMBIT | | Links | | --- | --- | - | Task | [task_gambit.wdl](https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/taxon_id/task_gambit.wdl) | + | Task | [task_gambit.wdl](https://github.com/theiagen/public_health_bioinformatics/blob/main/tasks/taxon_id/task_gambit.wdl) | | Software Source Code | [GAMBIT on GitHub](https://github.com/jlumpe/gambit) | | Software Documentation | [GAMBIT ReadTheDocs](https://gambit-genomics.readthedocs.io/en/latest/) | | Original Publication(s) | [GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277575) | diff --git a/docs/workflows/standalone/theiavalidate.md b/docs/workflows/standalone/theiavalidate.md index 9aee82b32..2dfd5b2b6 100644 --- a/docs/workflows/standalone/theiavalidate.md +++ b/docs/workflows/standalone/theiavalidate.md @@ -30,12 +30,12 @@ If additional validation metrics are desired, the user has the ability to provid - **EXACT** performs an exact string match and counts the number of exact match failures/differences - **IGNORE** does not check the values and says there are 0 failures - **SET** checks list items (such as `amrfinder_plus_genes` which is a comma-delimited list of genes) for identical content — order does not matter; that is, `mdsA,mdsB` is determined to be same as `mdsB,mdsA`. The EXACT match does not consider these to be the same, but the SET match does. --****, which is an actual decimal value such as **0.02**, calculates the percent difference between _numerical_ columns. If the columns are not numerical, this function will **not** work and will lead to workflow failure. For example, if the decimal percentage is 0.02, the test will indicate a failure if the values in the two columns are more than 2% different. +- ****, which is an actual decimal value such as **0.02**, calculates the percent difference between _numerical_ columns. If the columns are not numerical, this function will **not** work and will lead to workflow failure. For example, if the decimal percentage is 0.02, the test will indicate a failure if the values in the two columns are more than 2% different. - Dates, integers, and object-type values are ignored and indicate 0 failures. ### File Comparisons -If a column consists of only GCURIs (Google Cloud file paths), the files will be localized and compared with either an EXACT match or a SET match. In the SET match, the lines in the file are ordered before comparison. Results are returned to the summary table as expected. The results of each file comparison can be found in the `theiavalidate_diffs` output column. +If a column consists of only GCP URIs (Google Cloud file paths), the files will be localized and compared with either an EXACT match or a SET match. In the SET match, the lines in the file are ordered before comparison. Results are returned to the summary table as expected. The results of each file comparison can be found in the `theiavalidate_diffs` output column. ### Inputs